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INTRODUCTION 


A.  REPORT  BACKGROUND 


• This  report  introduces  data  base  concepts  and  presents  a methodology  for 
evaluating  Data  Base  Management  Systems  (DBMS)  available  today. 

• The  report  examines  a variety  of  problems  encountered  by  most  data  proces- 
sing installations,  and  uses  these  to  establish  criteria  which  DBMS  should 
address  to  be  effective  in  resolving  the  problems.  In  addition,  the  report 
presents  data  base  concepts  and  describes  the  many  functions  which  should  be 
provided  by  a DBMS  system. 

• The  criteria  can  be  used  to  assess  a number  of  DBMS  products.  No  attempt  is 
made  to  evaluate  each  product  against  any  other.  Instead,  several  popular 
DBMS  products  are  described  in  Section  IV.  For  each,  there  is  an  overview  of 
the  product  which  provides  a common  frame  of  reference.  Following  the 
product  overview,  each  product  is  then  evaluated  in  terms  of  its  ability  to 
satisfy  the  criteria.  Products  analyzed  and  their  suppliers  are: 

ADABAS,  Software  AG 

DMS  I I , Burroughs 
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DMS  170  (CODASYL),  Control  Data 


DMS  I 100  (CODASYL),  Univac 
IDMS  (CODASYL),  Cullinane. 

IMS  (DL/I),  IBM 
SYSTEM  2000,  MRI  Systems. 

TOTAL,  Cincom  Systems 

The  evaluation  criteria  and  product  overviews  can  be  used  to  evaluate  any  two 
or  more  products. 


Each  installation  may  have  different  DBMS  requirements. 

Having  determined  the  requirements  and  identified  the  criteria  which 
best  address  those  requirements,  the  relevant  criteria  are  then  used  to 
make  a product  evaluation  to  satisfy  the  unique  installation  needs. 


As  a further  aid  in  planning  for  future  requirements,  this  report  provides,  in 
separate  sections,  a method  for  performing  data  analysis  prior  to  the  estab- 
lishment of  a DBMS  and  a review  of  expected  trends  in  data  base  software  and 
associated  application  areas. 


This  report  is  provided  as  part  of  the  Planning  Service  For  Computer/Com- 
munication Users  and  was  prepared  in  conjunction  with  Infocom  Australia,  an 
affiliate  of  INPUT. 


- 2 - 


© 1978  by  INPUT,  Menlo  Park,  CA  94025.  Reproduction  Prohibited. 


INPUT 


B.  HISTORY  OF  DATA  BASE 


• During  the  1960s,  computers  were  used  for  clerical  and  accounting  functions, 
taking  over  much  of  the  work  previously  processed  by  accounting  machines. 
As  managers  became  more  aware  of  the  power  of  computers,  they  saw 
opportunities  to  use  them  to  analyze  and  consolidate  the  vast  masses  of  data 
generated  in  day-to-day  operations. 

• As  applications  were  brought  onto  computers,  applications  programs  defined 
the  data  necessary  for  processing  that  application.  Therefore,  as  an  example, 
separate  data  was  required  for  order  entry  applications,  for  payroll  applica- 
tions, etc. 

• As  more  applications  were  developed  for  computer  processing  it  was  found 
that  the  same  data  were  often  required  for  two  or  more  different  applications 
in  a company.  Thus,  for  example,  customer  data  required  for  an  order  entry 
application  were  also  needed  when  accounts  receivable  application  were 
placed  on  the  computer. 

• As  each  new  application  was  automated,  a decision  was  made  as  to  the  data  to 
be  used  by  that  application.  This  decision  generally  considered  whether  to 
incorporate  the  new  data  in  with  existing  files  to  avoid  having  to  duplicate 
common  data,  or  alternatively  to  create  completely  new  data  files  which 
contained  a duplicate  of  the  common  data. 

• In  making  a decision  to  consolidate  new  data  in  with  existing  files,  the  impact 
of  a change  to  the  file  on  existing  application  programs  had  to  be  assessed.  In 
most  instances,  the  extent  of  modification  to  existing  application  programs 
suggested  that  the  new  application  being  implemented  should  define  its  own, 
separate  data  files  rather  than  be  incorporated  in  existing  files. 

• In  the  event  of  a change  occurring  in  common  data,  that  change  had  to  be 
reflected  in  each  duplicate  copy  of  the  common  data.  This  was,  however, 
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considered  to  be  the  lesser  of  two  evils.  The  new  application  was  generally 
implemented  separately  from  existing  applications,  thereby  avoiding  modifica- 
tion on  existing  applications. 

• As  data  proliferated  and  appeared  in  several  versions  for  different  applications 
the  impact  of  change  of  the  data  became  a serious  problem.  In  addition,  the 
need  to  change  applications  and  data  was  usually  dictated  by  factors  outside 
the  control  of  data  processing  managers. 

• As  a result,  data  processing  installations  found  that  a large  percentage  of  their 
time  and  resources  (50%  or  more  of  their  programmer  resources)  were  spent  in 
modifying  or  maintaining  existing  applications,  rather  than  developing  new 
applications  for  computer  processing. 

• The  effect  of  maintenance,  involving  such  a large  proportion  of  the  resources 
of  a data  processing  installation,  has  often  been  to  delay  the  implementation 
of  new  applications.  Yet  it  is  through  new  applications  that  further  cost 
savings  or  productivity  benefits  come,  as  shown  in  Exhibit  l-l. 


C. INFORMATION  FOR  DECISION-MAKING 


• Initially,  management  installed  computers  to  process  not  only  the  day-to-day 
operations  of  the  organization's  applications,  but  also  to  extract  data  from 
those  applications  which  would  assist  in  management  control  or  decision- 
making. 

• However,  this  latter  use  of  computers  has  generally  not  been  realized. 
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EXHIBIT  1-1 


APPLICATION  DEVELOPMENT  RESOURCE  UTILIZATION 
WITHOUT  DATA  BASE  MANAGEMENT 
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• While  there  are  often  many  reasons  why  computers  have  not  been  used 
effectively  to  provide  information  to  management,  most  of  these  reasons  lead 
to  a common  problem.  As  new  applications  are  placed  on  computers  and  new 
data  files  created  for  those  applications,  the  ability  to  cross-relate  data  across 
many  applications  becomes  more  difficult. 

• In  an  attempt  to  address  the  problems  of  applications,  data  maintenance  and 
cross-relationships  of  data,  the  "Data  Base  Management  System  (DBMS)" 
concept  emerged  in  the  1960s  and  has  become  increasingly  accepted  in  the  last 
decade. 

D.  THE  DATA  BASE  TECHNIQUE 


• It  is  the  transformation  of  data  into  information  that  DBMS  address.  By 
providing  an  interface  between  application  programs  and  the  data,  DBMS 
control  and  manage  the  organization  of  that  data  in  the  computer,  and  present 
the  data  to  application  programs  in  the  form  requested. 

• A DBMS  generally  relieves  the  application  program  of  the  need  to  know  the 
location  of  data  stored  in  the  data  base.  It  locates  requested  data  and 
presents  it  to  the  program  in  a form  suitable  for  processing. 

• As  the  program  is  no  longer  dependent  upon  a particular  data  storage 
technique,  application  changes  which  dictate  a need  to  modify  the  storage  of 
data  can  be  accommodated  without  impacting  the  application  program  signifi- 
cantly. Provided  the  application  program  can  request  data  by  name,  the 
location  of  that  requested  data  can  change,  and  be  retrieved  by  the  DBMS 
without  affecting  the  application  program  itself. 
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• The  theoretical  result  is  a reduction  in  the  amount  of  program  maintenance 
necessary  to  accommodate  application  changes.  The  programming  resources 
freed  from  maintenance  can  then  be  used  for  new  application  development,  as 
pictured  in  Exhibit  1-2. 

E.  DATA  BASE  SYSTEMS  IN  THE  1970s 


• Of  the  early  data  base  management  techniques  developed,  two  have  become 
the  most  widely  used  techniques  today,  while  the  third  has  been  used  as  a basis 
for  specification  of  a proposed  common  data  base  language.  The  three  data 
base  management  techniques  referred  to  and  the  data  base  systems  which  used 
them  in  the  late  1960s  are: 

Hierarchical  - IBM's  Information  Management 

System  (IMS). 

Chained  File  - IBM's  Bill  of  Material  Processor 

(BOMP). 

Network  - Honeywell's  Integrated  Data  Store 

(IDS). 

• These  early  DBMS  all  became  available  around  the  1965-1968  time  frame  and 
formed  the  basis  for  a number  of  today's  systems. 

IBM's  Information  Management  System  (IMS)  used  a data  base  language 
referred  to  as  Data  Language/I  (DL/I).  DL/I  has  become  the  basis  not 
only  for  the  current  version  of  IMS  under  OS/VS  (IMS/VS),  but  is  also 
available  on  DOS/VS  systems  as  DL/I  ENTRY  and  DL/I  DOS/VS. 

IBM's  Bill  Of  Material  Processor  (BOMP)  was  developed  primarily  to 
meet  the  data  base  requirements  of  manufacturing  applications.  From 
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EXHIBIT  1-2 


APPLICATION  DEVELOPMENT  RESOURCE  UTILIZATION 
WITH  DATA  BASE  MANAGEMENT 


MANPOWER 
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BOMP  was  developed  IBM's  Data  Base  Organization  and  Maintenance 
Processor  (DBOMP),  with  additional  enhancements  to  address  needs  of 
manufacturing  applications. 


IBM  has  de-emphasized  DBOMP  in  favor  of  DL/I  as  their  prime  data 
base  system  because  of  significant  advantages  which  they  claim  are 
offered  by  DL/I  over  DBOMP. 

An  independent  development  of  a data  base  system  which  grew  from 
the  concepts  of  BOMP  is  TOTAL,  marketed  by  Cincom  Systems. 
TOTAL  and  DL/I  are  the  two  most  widely  used  data  base  systems  today. 

Honeywell's  Integrated  Data  Store  was  used  as  the  basis  for  the 
specifications  of  a common  data  base  language  by  the  CODASYL 
Committee. 

The  early  1970s  have  seen  the  development  of  a number  of  DBMS,  some  of 
which  are: 

Data  Base  Management  System  Supplier 


DL/I  (IMS) 

ADABAS 
TOTAL 
System  2000 
DMS  I 100 
IDS 
IDMS 
DMS  II 

DBMS  10,  DBMS  20 
DBMS  I I 
DMS  170 

IMAGE/ 1 000,  IMAGE/3000 
INFOS 


IBM 

Software  AG 
Cincom  Systems 
MRI  Systems 
Univac 
Honeywell 
Cullinane 
Burroughs 
Digital  Equipment 
Digital  Equipment 
Control  Data 
Hewlett-Packard 
Data  General 
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II 


EXECUTIVE  SUMMARY 


A.  SCOPE  AND  KEY  ISSUES 


Data  Base  Management  Systems  (DBMS)  offer  the  benefit  of  faster  application 
development  and  easier  maintenance,  while  permitting  the  diverse  data 
requirements  of  user  organizations  to  be  inter-related  more  effectively  than 
through  the  use  of  traditional  application  program  development  techniques. 

Applications  program  development  and  maintenance  is  very  labor-intensive, 
requiring  highly  paid,  skilled  personnel.  With  salary  costs  increasing  while 
computer  hardware  and  communication  costs  are  decreasing,  many  organiza- 
tions are  finding  that  application  development  and  maintenance  costs  often 
exceed  the  cost  of  equipment  and  communications  by  a factor  of  two  or  three 
times. 

Reflecting  this  inbalance  in  costs,  an  increasing  number  of  organizations  are 
moving  to  methods  that  will  reduce  application  development  and  maintenance 
costs.  Data  Base  techniques  present  an  opportunity  for  organizations  to 
control  these  costs  more  effectively. 

Recognizing  the  growing  importance  of  Data  Base  Management  Systems, 
INPUT  predicted  in  "Data  Base  Management  Software  Markets"  (May  1978) 
that  the  market  for  DBMS  products  in  the  U.S.  is  increasing  at  48%  per  year 
and  is  expected  to  continue  at  this  rate  to  1983  (see  Exhibit  ll-l). 
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EXHIBIT  11-1 


o 


DBMS  LEASE/PURCHASE/MAINTENANCE  EXPENDITURES  GROWTH 

1977  TO  1983 


SIZE  OF 
DBMS 

1977  ($  MILLIONS) 

1983  ($  MILLIONS) 

AVERAGE 
ANNUAL 
GROWTH 
RATE  (%) 

LEASE 

PURCHASE 

MAINTE- 

NANCE 

TOTAL 

LEASE 

PURCHASE 

MAINTE- 

NANCE 

TOTAL 

LARGE 

S36 

$15 

$3 

$54 

$200 

$165 

$ 65 

$430 

41% 

MEDIUM 

14 

10 

4 

28 

135 

80 

30 

250 

44 

SMALL 

- 

3 

1 

4 

- 

100 

85 

185 

90 

VERY  SMALL 

- 

0 

- 

0 

- 

25 

- 

25 

oo 

TOTAL 

$50 

$28 

$8 

$86 

$335 

$370 

$180 

$890 

48% 

o 
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Data  Base  techniques  have  been  evolving  such  that  today  there  are  four  main 
categories  of  DBMS  products  available. 


CATEGORY 


Hierarchical 


Network 


- CODASYL 


Inverted  File 


REPRESENTATIVE  DBMS  PRODUCTS 
I MS/VS,  DL/I 

TOTAL,  IMAGE/ 1 000,  IMAGE/3000, 

INFOS 

DMS  90,  DMS  MOO,  IDMS,  DMS  170, 
DBMS  10,  DMS  I I 

ADABAS,  SYSTEM  2000,  DATACOM 


• While  these  categories  originally  offered  a high  degree  of  product  distinction, 
many  products  are  now  providing  facilities  offered  by  products  in  other 
categories  resulting  in  less  separation  between  products. 


• Organizations  are  faced  with  a bewildering  array  of  DBMS  products.  This 
report  has  consolidated  material  available  from  a variety  of  sources  on  many 
products,  for  the  purpose  of  permitting  users  to  make  an  independent 
assessment  of  the  different  DBMS  products. 


• Furthermore,  this  report  presents  an  evaluation  methodology  that  enables 
users  to  identify  those  products  that  best  match  their  organization's  unique 
requirements.  The  reviews  of  each  product  in  terms  of  objective  criteria 
provided  in  this  report  are  an  aid  to  organizations  in  making  this  assessment. 
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B. 


DBMS  - USER  CONSIDERATIONS 


9 


The  early  1970s  saw  the  development  of  a number  of  DBMS  by  computer 
manufacturers,  computer  users,  and  independent  software  companies.  In 
addition,  remote  computing  services  firms  developed  proprietary  implemen- 
tation languages  that  contained  DBMS  capabilities. 


• Computer  manufacturers  currently  have  44%  of  the  installations  of  DBMS  in 
the  U.S. 


IBM,  with  IMS  (DL/I)  has  the  largest  number  of  DBMS  installations 
(1,100)  of  the  computer  manufacturers  and  the  highest  revenue. 


Hewlett-Packard  and  Data  General  in  the  minicomputer  area  offer 
DBMS;  HP  has  almost  as  many  installations  (1,000)  for  IMAGE  3000  as 
IBM's  IMS. 

Univac  and  Honeywell  have  provided  DBMS  software  to  their  users 
without  charge. 


Honeywell  introduced  the  first  DBMS  package  (IDS)  in  1963  and  has 
over  700  installations. 


Burroughs,  Control  Data,  Honeywell,  and  DEC  have  all  introduced 
DBMS  products  in  the  last  two  years. 

• Independent  software  companies  have  been  able  to  take  advantage  of  the 
historical  lack  of  availability  and  the  performance  and  feature  problems  of  the 
computer  manufacturer's  products  in  order  to  build  the  majority  market  share. 


Cincom,  with  TOTAL,  has  over  30%  of  the  total  number  of  DBMS 
installations. 
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ADABAS  and  IDMS  are  gaining  significantly  in  the  marketplace  of 
current  IBM  products  and  have  started  replacing  some  IMS  installations. 

• Remote  computing  services  vendors  have  developed  their  own  implementation 
languages/DBMS,  as  CSC,  Tymshare,  and  National  CSS  have  done,  or  commis- 
sioned them  from  a specialist  software  company,  as  CDC/SBC  and  ADP 
Network  Services  have  done,  or  licensed  generally  available  products  such  as 
IMS,  Total,  and  System  1022. 

RCS  companies  have  not  sold  proprietary  DBMS  as  software  packages. 

Recent  announcements  by  ADP  and  NCSS  indicate  that  these  vendors 
will  sell  their  software  in  a "package"  with  hardware. 

• Exhibit  11-2  provides  a matrix  comparison  of  DBMS  products  versus  hardware 
compatibility. 

Fifty-six  percent  (9  of  16  products)  run  on  IBM  or  IBM-compatible 
hardware. 

Four  IBM  compatible  products  (TOTAL,  IDMS,  SYSTEM  2000,  ADABAS) 
accounted  for  44%  of  1977  DBMS  sales. 

TOTAL  offers  the  highest  degree  of  machine  portability.  It  is  capable 
of  operating  on  70%  of  the  hardware  listed. 

SEED  offers  similar  machine  portability  (it  can  run  on  any  system  that 
supports  FORTRAN).  It  is  a newer  product  and  has  not  been  installed 
on  nearly  as  many  systems  as  TOTAL. 

• In  a recent  INPUT  study  (second  quarter  1978),  71%  of  the  companies  with  a 
DBMS  installed  reported  they  had  conducted  an  extensive  analysis  of  DBMS 
products  before  making  a final  decision.  This  is  evidence  that: 


- 15- 


© 1978  by  INPUT,  Menlo  Park,  CA  94025.  Reproduction  Prohibited. 


INPUT 


EXHIBIT  11-2 
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DBMS  HARDWARE  COMPARISON 


NAME 

OF 

DBMS 

BURROUGHS 

CDC 

DEC  10  & 20 

DEC  DDPII 

HARRIS 

HEWLETT-PACKARD 

HONEYWELL 

IBM  360/370 

IBM  COMPATIBLES 

IBM  SERIES/1 

IBM  SYSTEM/3 

INTERDATA 

CL 

S 

O 

o 

D 

O 

2 

NCR 

SIEMANS 

UNIVAC 

VARIAN 

XITAN  280 

ADABAS 

• 

• 

• 

DBMS  10 

• 

DMS  II 

• 

DMS  1100 

• 

DATACOM  D/B 

• 

• 

IDMS 

• 

• 

IDS 

• 

IMS 

• 

• 

IMAGE  3000 

• 

INQUIRE 

• 

• 

MODEL  204 

• 

• 

SEED 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 

MICROSEED 

• 

SYSTEM  1022 

• 

SYSTEM  2000 

• 

• 

• 

• 

TOTAL 

« 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 
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The  DBMS  market  is  highly  competitive. 


DP  Managers  are  concerned  about  the  impact  of  a DBMS  and  carefully 
examine  alternatives  on  an  analytic  basis  before  making  their  final 
selection. 

• 

Reasons  cited  by  the  companies  who  did  not  perform  an  extensive  evaluation 
(29%  of  the  respondents)  of  DBMS  products  were: 

Hardware  compatibility  problems  and  only  one  or  two  products  were 
available. 

Very  few  products  around  at  the  time. 

Loyalty  to  their  hardware  manufacturer. 

c * 

When  asked  what  prompted  their  company  to  purchase  DBMS,  almost  three 
quarters  of  respondents  stated  the  decision  was  based  on  a particular  applica- 
tion requirement. 

All  of  the  applications  cited  required  on-line  capabilities.  This  was  the 
prime  driving  force  for  selecting  a DBMS. 

• 

Companies  who  did  not  cite  particular  application  as  the  reason  for  buying  a 
DBMS  justified  the  purchase  for  one  of  the  following  reasons: 

"Wanted  data  independence  from  programs." 

"Needed  a system  to  manage  data  for  the  future." 

"Reduce  program  maintenance." 

"Easy  for  non-programmers  to  use." 

c 
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"Applications  can  be  brought  up  faster." 


"Query  language  - users  wanted  access  to  their  data  bases." 

"Reduction  of  redundant  files." 

One  respondent  stated,  with  evident  honesty,  they  bought  one  because 
they  believed  "all  the  propaganda  about  DBMS  capabilities. 

• The  total  investment  to  implement  a DBMS,  according  to  INPUT  study 
respondents,  ranged  from  $35,500  to  $5  million.  This  includes  purchase  price 
(or  lease  commitment),  training,  and  conversion  costs. 

The  initial  purchase  price  of  selected  DBMS  ranged  from  $22,500  to 
$112,000.  The  monthly  lease  price  for  DBMS  ranged  from  $1,000  to 
$5,000. 

Ongoing  annual  expenditures  ranged  from  $1,000  to  $1  million  with  an 
average  cost  of  $122,000. 

In  addition  to  these  explicit  costs,  training  costs  and  associated  salaries 
contributed  to  the  total  operating  cost. 

• Highest  costs  were  experienced  with  IMS  installations: 

Several  companies  stated  it  took  a year  before  programmers  became 
proficient  in  the  use  of  IMS. 

Companies  that  were  able  to  place  a dollar  value  on  training  expenses 
estimated  from  $15,000  to  $40,000  per  IMS  programmer. 
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• The  number  of  personnel  required  to  support  the  DBMS  (other  than  applica- 
tions programmers)  ranged  from  0 to  1-2  staff  members  for  TOTAL,  IDMS, 
ADABAS,  and  System  2000.  IMS  users  required  a minimum  of  three  support 
personnel,  and  two  installations  reported  having  a staff  of  10-12. 

• Seventy-four  percent  stated  they  had  to  upgrade  their  hardware  since  instal- 
ling a DBMS.  However,  only  37%  attributed  the  upgrade  directly  to  the  DBMS. 
A number  of  companies  stated  the  DBMS  was  not  directly  responsible  but  did 
admit  it  was  a contributing  factor. 

• Companies  who  experienced  the  largest  increase  in  hardware  upgrading  were 
the  IMS  installations.  Thirteen  of  the  18  IMS  installations  interviewed  had 
upgraded.  Six  stated  IMS  was  directly  responsible.  The  other  seven  stated  it 
was  a contributing  factor  or  they  upgraded  in  anticipation  of  IMS. 

• The  cost  of  DBMS  as  seen  by  users  is  high  and  70%  thought  their  expenditures 
would  increase.  However,  when  asked  if  they  had  to  do  it  over  again,  96% 
stated  they  would  still  buy  a DBMS  and  86%  would  select  the  same  one. 

C.  FUTURE  TRENDS  IN  DBMS 


• Data  base  techniques  continue  to  evolve  with  a likely  move  in  one  direction 
towards  a data  base  standard  modelled  on  the  CODAS YL  Data  Base  Task 
Group  recommendations  and  in  another  direction  with  the  anticipated 
announcement  of  the  first  Relational  Data  Base  products  in  the  next  twelve 
months. 

• Backend  processors,  the  implementation  of  all  or  part  of  a DBMS  in  hardware, 
will  impact  users  by  providing  still  another  alternative. 
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INPUT  does  not  anticipate  that  IBM  will  implement  this  approach.  It  is 
more  likely  to  integrate  a DBMS  processor  in  future  mass  storage 
devices. 

Other  technology  companies  will  implement  this  approach  and  will 
attain  revenues  of  approximately  $200  million  by  1983. 

• Section  VI  of  this  report  on  Automatic  Application  Development  Systems  is  a 
pointer  to  the  future,  providing  a more  user-oriented  interface  between 
applications  specifications  and  the  translation  of  those  specifications  into 
executable  systems  on  computers. 

• The  emergence  of  Query  Languages  (available  now  with  major  DBMS  products) 
is  directed  at  satsifying  the  needs  of  the  end  user  more  effectively,  enabling 
him  to  request  directly  the  information  he  requires  from  a data  base. 

• The  soon  to  be  announced  Query-by-Example  product  from  IBM  is  even  more 
end  user  oriented.  It  is  expected  to  be  implemented  initially  upon  IBM's  DL/I 
and  later  upon  a Relational  Data  Base  approach. 

• On-line  application  development  systems  such  as  IBM's  DMS/VS  provide  an 
approach  to  translation  of  system  specifications  directly  into  operational  on- 
line applications.  However,  while  permitting  a higher  level  of  productivity  in 
application  development  and  maintenance  than  traditional  programming  tech- 
niques, DMS/VS  still  requires  programming  expertise. 

• The  availability  of  ADMINS-I  I,  operational  on  the  Digital  Equipment  Corpor- 
ation PDP  11/70,  is  a leading  example  of  what  lies  ahead  in  Automated 
Application  Development  Systems.  This  product,  while  not  a DBMS  per  se, 
provides  an  easy  to  use  application  development  and  maintenance  capability 
and  can  be  utilized  by  end  users  (with  some  training)  in  specifying  and 
generating  their  on-line  applications. 
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• Less  expensive  communications,  particularly  from  Satellite  and  Value  Added 
Network  (VAN)  vendors,  will  allow  distribution  and  transmission  of  large  data 
bases. 

This  will  accelerate  the  implementation  of  Distributed  Data  Processing 
(DDP)  incorporating  DBMS. 

. DDP  is  still  little  understood  at  the  user  level. 

. The  major  impact  will  be  post-1980. 

D.  MANAGEMENT  RECOMMENDATIONS 


• Application  development  delays  and  maintenance  costs  in  many  organizations 
are  dictating  a move  to  DBMS.  Such  a move  should  be  made  not  solely  to 
convert  existing  applications  to  data  base  but,  instead,  to  develop  new 
applications  under  control  of  data  base. 

This  approach  insures  a gradual  transition  to  data  base,  using  the 
facilities  offered  by  Data  Base  Management  Systems  more  effectively 
for  new  application  development. 


• Few  organizations  can  afford  to  disregard  application  development  and  main- 
tenance time  and  cost  savings  that  can  be  achieved  with  today's  DBMS 
products,  while  waiting  for  data  base  technolgy  to  evolve  fully.  Accordingly, 
management  should  not  wait  for  Relational  Data  Base. 

As  indicated  above,  the  first  Relational  Data  Base  Systems  are  likely  to 
be  announced  in  the  next  twelve  months.  It  may  well  be  several  years 
before  the  industry  can  effectively  utilize  the  relational  data  base 
approach,  both  from  a software  technology  as  well  as  a hardware 
capability  point  of  view. 
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Management  should  instead  select  the  DBMS  product  that  meets  the  organi- 
zation's unique  requirements  provided  that  the  selected  product  is  well  known 
with  a large  base  of  satisfied  users.  Such  a user  base  provides  an  incentive  to 
the  DBMS  supplier  to  offer  ongoing  support  and  migrate  its  users  into  new  data 
base  technology  as  it  evolves,  while  protecting  their  investment  in  applications 
already  developed  based  on  those  DBMS  products. 
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III  DATA  BASE  SOFTWARE  EVALUATION  METHODOLOGY 


A.  OBJECTIVES  AND  FUNCTIONS 


This  report  describes  a methodology  developed  to  enable  DBMS  to  be  evalu- 
ated comparatively. 


The  methodology  is  based  upon  a number  of  evaluation  criteria  which 
are  independent  of  any  particular  data  base  implementation. 

The  criteria  identify  desirable  requirements  of  any  DBMS. 


Each  DBMS  is  then  evaluated  on  the  ways  in  which  it  satisfies  the 
various  criteria. 


Valid  comparisons  can  then  be  made  between  different  DBMS  imple- 
mentations. 

A company  which  is  about  to  make  a DBMS  decision  must  assess  the  various 
criteria  in  relation  to  its  own  company  objectives.  These  objectives  will 
indicate  the  relative  importance  of  the  various  criteria  to  the  company  in 
satisfying  its  application  and  data  base  requirements. 
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The  final  DBMS  decision  must  rest  with  each  company  and  must  be  made  in 
light  of  the  ability  of  the  various  suppliers'  systems  to  satisfy  those  criteria 
which  the  company  indentifies  as  most  significant  to  its  requirements. 

This  report  makes  no  recommendations  as  to  which  DBMS  should  be  utilized. 
This  selection  will  vary  as  the  relative  importance  of  the  different  evaluation 
criteria  varies  between  companies. 


Factors  which  are  outside  the  scope  of  this  report  and  which  should  be  taken 
into  account  in  a DBMS  decision  are  the  impact  of  various  DBMS  on  existing 
applications  being  processed.  These  impacts  may  relate  to: 

Ease  of  migration  of  existing  applications  to  the  DBMS. 


Ability  for  existing  applications  to  co-reside  and  process  in  the  same 
system  as  new  applications  developed  using  DBMS. 

DATA  BASE  SOFTWARE  OBJECTIVES 


The  prime  objectives  of  any  DBMS  can  be  stated  as: 

"Timely  availability  of  current  information  to  fulfill  a variety  of  present  and 
future  needs." 


The  key  words  are:  TIMELY,  CURRENT,  VARIETY,  PRESENT,  and  FUTURE. 
Each  of  these  key  factors  is  discussed  to  assess  their  importance  to  a DBMS 
system. 


TIMELY  information  implies  that  multiple  users  should  be  able  to  access  data 
at  the  same  time,  to  satisfy  their  differing  requirements.  These  multiple  users 
may  include  batch  programs  retrieving  data  from  the  data  base  for  processing 
as  well  as  multiple  on-line  terminals  making  data  available  directly  to  user 
personnel. 
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• In  order  for  all  users  to  access  CURRENT  information,  it  is  important  that 
where  possible  only  one  copy  of  that  information  exists  in  the  data  base  and 
that  information  is  updated  at  the  earliest  time  to  reflect  any  data  changes. 
Once  updated,  as  it  exists  in  only  one  copy,  it  is  available  to  all  users  who 
require  it. 

For  data  to  be  maintained  as  current  as  possible  implies  that  the  data 
must  be  captured  as  close  to  its  source  as  practical. 

This  generally  involves  on-line  terminals  which  enable  data  to  be 
entered  from  its  point  of  origin  and  therefore  update  the  data  base 
immediately. 

• The  requirement  to  satisfy  PRESENT  and  FUTURE  needs  implies  the  ability  to 
change  as  the  application  requirements  change.  An  application,  developed 
using  a DBMS  at  present,  should  be  able  to  move  to  a different  environment 
without  requiring  significant  reprogramming. 

• Another  consideration  in  the  provision  of  timely  information  to  multiple  users 
is  the  ability  to  satisfy  spontaneous  requests  for  information  which  enable  the 
data  to  be  viewed  in  a VARIETY  of  different  ways. 

For  example,  the  accounts  receivable  department  may  need  to  identify 
all  customers  with  accounts  over  90  days  old.  This  implies  that  the 
customer  data  needs  to  be  accessed  on  the  basis  of  the  accounts 
receivable  due  date  of  payment. 

The  order  department  may  need  to  assess  the  effect  on  customers  of 
the  inability  to  supply  a particular  part.  This  implies  the  need  to 
identify  for  each  part,  all  of  the  orders  requesting  that  part  and  the 
customers  who  placed  those  orders. 

To  satisfy  these  different  requirements  for  presentation  of  data,  a 
DBMS  ideally  should  only  retain  one  copy  of  that  data  - yet  allow  it  to 
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be  viewed  by  the  appropriate  user  in  a way  different  from  its  normal 
use. 


• A DBMS  should  therefore: 

Satisfy  simultaneous  requests  for  information  from  multiple  users. 

Allow  the  same  data  to  be  shared  by  multiple  users. 

Consolidate  different  data  to  enable  information  relationships  to  be 
identified. 

Capture  data  at  its  source  to  ensure  the  most  up-to-date  version  of  that 
data  is  available. 


Easily  accommodate  new  data  needs,  without  requiring  significant 
modification  of  existing  data  or  programs. 

Satisfy  spontaneous  requests  for  information,  extracting  that  informa- 
tion possibly  in  a different  form  from  the  way  it  is  normally  stored. 


Allow  multiple  users  each  to  view  the  data  differently  in  a way  which 
best  meets  each  user's  own  requirements. 


2.  PROBLEMS  IN  SATISFYING  DBMS  OBJECTIVES 


• A number  of  problems  arise  in  satisfying  the  above  objectives: 

How  can  a data  structure  discipline  be  defined  which  will  enable 
multiple  views  of  data  to  be  supported  and  yet  allow  that  data  to  be 
easily  accessible  regardless  of  which  view  of  data  is  required. 
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How  can  multiple  users  share  the  same  data  resource  and  still  have  the 
data  protected  from  unauthorized  access  or  change,  as  well  as  pro- 
tected from  loss  due  to  system  or  program  failure? 

How  can  the  investment  in  current  applications  and  processing  be 
protected  while  accommodating  changes  in  data  to  meet  future  appli- 
cation needs? 

Above  all,  how  can  these  capabilities  be  provided  by  a system  that  is 
both  easy  to  use  and  economical? 

• In  evaluating  a DBMS  the  ability  of  the  particular  product  to  address  these 
problems  and  so  satisfy  the  DBMS  objectives  must  be  addressed. 

B.  DBMS  EVALUATION  CRITERIA 


The  fundamental  criteria  to  be  used  to  evaluate  DBMS  are: 

Basic  Functional  Capabilities  of  a DBMS  should  define  an  easily 
accessible  data  structure  discipline  which  will  support  multiple  views  of 
consolidated  data. 

Data  Independence  describes  the  extent  to  which  an  application  pro- 
gram is  isolated  from  a need  to  know  the  physical  organization  and 
structure  of  the  data  base.  This  capability  enables  the  investment  in 
current  processing  to  be  protected  since  changes  in  the  data  base  to 
accommodate  new  applications  then  have  little  impact  on  existing 
applications. 

Data  Integrity  and  Security  are  two  terms  used  to  define  protection  of 
a shared  data  resource.  Data  integrity  refers  to  the  ability  of  the 
DBMS  to  protect  the  data  from  system  or  program  malfunctions  while 
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Data  security  protects  the  data  from  unauthorized  access  and  change 
by  users. 


o 


Ease  of  Use,  Cost  and  Performance  factors  of  a DBMS  ensure  that  the 
above  capabilities  are  easy  and  economic  to  use. 


C.  BASIC  FUNCTIONAL  CAPABILITIES 


• There  are  three  criteria  used  to  evaluate  Basic  Functional  Capabilities, 
namely: 


Easy  accessibility. 

Multiple  views  of  data. 

Data  consolidation  capabilities. 
EASY  ACCESSIBILITY 


• Easy  accessibility  of  data  should  satisfy  two  classes  of  users  of  a data  base, 
programmers  and  end  users. 


• Programmers  generally  write  programs  to  access  a data  base  to  satisfy 
"planned  data  requests."  Ideally,  a DBMS  should  support  three  classes  of 
programming  languages: 

Assembly  Language  is  a machine  oriented  language  which  is  used  to 
provide  a high  degree  of  efficiency  in  storage  utilization  and  CPU 
usage. 


COBOL,  PL/I  and  RPG  are  high  level  languages  oriented  towards 
commercial  applications. 
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FORTRAN  and  PL/I  are  high  level  languages  oriented  towards  scientific 
applications. 

• The  end  user  who  uses  a DBMS  generally  has  a requirement  to  satisfy  a 
spontaneous  request  for  information. 

The  need  for  information  often  cannot  be  planned  in  advance  and  may 
not  be  repetitive  in  nature. 

Such  "ad  hoc"  data  requests  require  an  easy-to-use  language  which  does 
not  demand  prior  programming  knowledge  by  the  end  user. 

• Thus,  the  commercial  end  user  requires  an  easy-to-use  "Query"  language  which 
is  ideally  very  similar  to  English. 

• The  scientific  end  user,  on  the  other  hand,  requires  an  easy-to-use  language 
which  enables  concise  expression  of  scientific  formulae  without  requiring  the 
rigid  programming  conventions  of  FORTRAN  and  PL/I. 

• In  order  to  make  data  directly  available  to  the  end  user,  the  data  base  should 
be  accessible  from  terminals.  Accordingly,  the  data  base  system  should  also 
have  an  associated  data  communication  system  which  enables  both  pre-planned 
data  requests  as  well  as  spontaneous  data  requests  to  be  entered  from  on-line 
terminals. 

2.  MULTIPLE  VIEWS  OF  DATA  BASE 

• A DBMS  should  be  capable  of  satisfying  a variety  of  requests  for  information. 
These  requests  may  require  the  data  to  be  accessed  differently  from  the 
physical  storage  of  data  on  the  data  base. 

• The  data  base  system  should  be  capable  of  accessing  information  to  satisfy  a 
request  for  information  in  a defined  sequence  -sequentially.  Another  request 
may  require  data  to  be  accessed  randomly  while  a third  request  may  require 
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use  of  an  index  to  retrieve  data  in  some  order  different  from  its  normal 
sequence. 

• A Sequential  view  of  data  may  be  required  when  a large  number  of  data 
requests  are  to  be  processed  in  a predefined  sequence. 

This  is  often  the  case  in  batch  processing  and  enables  a great  deal  of 
efficiency  to  be  achieved  by  sequencing  the  input  requests  in  the  same 
order  as  the  physical  storage  of  data  in  the  data  base. 

In  an  on-line  environment  a similar  requirement  for  sequential  retrieval 
may  be  demanded  by  a terminal  operator's  request  to  browse  through 
the  data  base  in  a specific  sequence. 

• A Random  view  of  the  data  base  is  required  when  the  entry  sequence  of 
transactions  is  uncontrolled  and  the  complete  identity  of  the  requested  data  is 
known. 


An  example  of  a random  requirement  for  data  is  a simple  inquiry  and 
update  transaction  in  an  on-line  application.  In  this  case,  the  unique 
indentif ication  of  the  data  required  is  fully  known. 

• However,  with  many  queries  the  full  identification  of  the  data  is  not  known 
and  therefore  cannot  be  used  to  retrieve  the  data  randomly.  While  this  type  of 
request  could  be  satisfied  by  searching  a data  base  sequentially  until  the 
various  records  meeting  the  particular  conditions  were  identified,  the  time 
would  be  prohibitive.  Instead,  what  is  required  is  the  ability  to  access  an  index 
which  identifies  all  of  those  records  satisfying  the  particular  conditions  for 
retrieval. 

An  example  of  an  Indexed  view  of  data  would  be  an  on-line  request  to 
retrieve  all  customers  with  customer  numbers  between  1 ,000  and  3,000. 
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Another  request  may  require  all  parts  to  be  retrieved  whose  part 
numbers  are  greater  than  AB300. 

Each  of  these  examples  implies  the  use  of  an  intermediate  index  which 
identifies  those  unique  records  meeting  the  condition  specified  and 
allows  those  records  to  be  retrieved  without  having  to  access  the  entire 
data  base. 

• 

As  an  extension  of  this  capability,  the  data  base  should  permit  multiple  indices 
to  be  used  to  access  the  data  base  in  several  sequences  different  from  that  in 
which  the  data  is  stored. 

• 

This  has  the  advantage  of  uniquely  indentifying  only  those  records  that  satsify 
the  particular  search  criteria,  so  enabling  only  those  records  to  be  accessed 
and  resulting  in  less  processing  time  than  would  be  necessary  if  the  entire  data 
base  had  to  be  searched  for  each  request. 

c , 

DATA  CONSOLIDATION  CAPABILITIES 

• 

Data  consolidation  is  generally  the  main  reason  for  the  use  of  a data  base. 
There  are  many  techniques  available,  all  of  which  have  their  particular 
advantages  and  disadvantages  depending  upon  the  data  and  application  require- 
ments. Data  consolidation  techniques  include: 

Networks. 

Hierarcharies. 

Inverted  files. 

Relational  models. 

• 

o 

The  primary  purpose  of  data  consolidation  is  to  enable  a single  copy  of  the 
piece  of  data  to  be  used  to  satisfy  the  requirements  of  all  users  of  that  data. 
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Recording  all  of  the  information  relating  to  a piece  of  data  only  once  enables 
that  data  to  be  updated  once  and  then  be  available  in  its  most  current  form  to 
all  users  of  that  data.  Also,  a single  copy  of  data  must  be  easily  referenced 
from  other  data. 

• Since  much  data  is  naturally  variable  in  length,  such  as  names  or  addresses,  a 
method  of  handling  variable  length  data  is  needed. 

• Regardless  of  the  particular  data  base  technique  used,  all  DBMS  aim  to 
achieve: 

The  support  of  variable  length  data. 

The  ability  to  interrelate  data. 

• Various  data  base  systems  differ  in  the  degree  to  which  they  achieve  these 
capabilities.  In  order  to  evaluate  data  base  systems  on  their  data  base 
consolidation  capability,  some  general  criteria  to  measure  the  degree  of 
consolidation  achieved  must  be  established. 

• The  information  relating  to  a given  piece  of  data  can  be  segmented  into 
elements  called  "Repeating  Groups."  These  groups  are  characterized  by  their 
presence  in  differing  numbers,  or  absence,  in  various  data  records  in  a data 
base. 


Thus,  one  customer  may  have  several  open  invoices  reflecting  his 
accounts  receivable  status,  while  another  customer  may  have  no 
invoices  outstanding.  Similarly,  a customer  may  have  multiple  open 
orders,  each  containing  a multiple  number  of  parts  on  order. 

Each  of  these  repeating  group  elements  contains  information  relating  to 
the  particular  data.  For  example,  parts  on  order  contains  information 
relating  to  the  part  number,  quantity,  description,  unit  price,  discounts 
and  so  on. 
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The  customer  record  then  becomes  a variable  length  record,  some 

customers  containing  more  repeating  group  elements  than  others. 

• The  need  to  support  repeating  group  elements  and  the  requirement  to  allow 
variable  length  data  to  reside  in  the  data  base  become  important  data 
consolidation  criteria. 

• Criteria  which  determine  the  degree  of  variable  length  support  are  any 
constraints  introduced  by  the  data  base  system  which  limit: 

The  number  of  repeating  group  types  per  data  entity. 

The  number  of  occurrences  per  type. 

Support  for  variable  length  occurrences  of  repeating  group  elements. 

The  number  of  nested  levels  of  repeating  groups  available. 

• The  important  factor  to  consider  is  not  whether  the  data  base  system  provides 
support  for  repeating  group  elements  and  variable  length  records,  but  rather 
whether  there  are  any  significant  constraints  introduced  by  the  particular  data 
base  system  which  impacts  the  usefulness  of  its  variable  length  record  support 
for  particular  applications. 

• The  ability  to  interrelate  data  entities  is  another  important  data  consolidation 
criterion.  Again,  the  criteria  for  measuring  this  data  base  capability  are  any 
contraints  which  may  impact  the  ability  of  a particular  data  base  system  to 
satisfy  the  consolidation  requirements  of  various  applications.  Thus,  the 
following  information  is  useful  to  identify  data  base  systems  which  may  have 
resticted  data  consolidation  capability: 

Number  of  relationships  allowed  for  the  entire  data  base. 

Number  of  relationships  allowed  for  a given  entity  occurrence. 
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Number  of  relationships  for  a program  or  a user  view. 


• It  is  important  to  separate  those  DBMS  which  support  a higher  number  of 
interrelationships  than  could  conceivably  be  used  in  a given  application  from 
those  DBMS  that  have  limits  on  the  number  of  interrelationships  which  may 
prove  very  restrictive  for  effective  data  base  use. 

• Basic  functional  capabilities  summary: 

The  data  base  system  should  allow  easy  accessibility  both  by  a 
programmer  and  end  user.  Both  commercial  and  scientific  languages 
should  be  supported  for  each  class  of  user,  and  data  communication 
support  should  be  available  to  permit  access  to  the  data  base  from  on- 
line terminals. 

Multiple  views  of  the  data  should  be  provided  to  satisfy  different  types 
of  data  requests.  This  implies  the  need  to  access  data  sequentially, 
randomly,  and  via  an  index. 

Data  consolidation  requires  a data  base  system  to  support  variable 
length  records  comprising  repeating  groups  as  well  as  the  ability  to 
interrelate  data  entities  in  the  data  base.  In  these  two  areas,  the 
criteria  attempt  to  identify  any  contraints  imposed  by  the  data  base 
system  which  may  result  in  restrictions  in  designing  and  implementing 
data  bases  to  satisfy  the  needs  of  various  applications. 


D.  DATA  INDEPENDENCE 


• The  objective  of  data  independence  is  to  protect  current  programs  from 
changes  to  the  data  base  which  do  not  directly  affect  the  logic  of  the  program. 
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• To  achieve  this  data  independence  the  program  should  be  divorced  as  much  as 
possible  from  physical  data  storage  considerations  or  even  knowledge  of  the 
physical  storage  of  data.  In  this  way,  changes  can  be  made  in  the  manner  in 
which  data  is  physically  stored  without  necessarily  impacting  programs  which 
access  that  data. 

I.  LEVELS  OF  MAPPING 

• DBMS  provide  varying  degrees  of  data  independence  generally  by  allowing  the 
application  program  to  request  data  by  name  without  knowing  the  physical 
location  of  that  data.  This  implies  the  ability  to  "map"  data  names  relating 
that  data  to  its  physical  location  in  a data  base.  It  is  in  the  degree  of  mapping 
support  that  the  data  independence  ability  of  a DBMS  can  be  evaluated. 

• Each  of  these  maps  effectively  becomes  a filter  where  data  dependencies 
which  would  otherwise  be  located  in  the  application  program  can  be  removed 
and  placed  instead  in  the  DBMS-provided  maps.  This  allows  the  application 
code  to  remain  fairly  independent  of  physical  data  considerations,  with  data 
considerations  being  resolved  in  the  individual  maps.  (See  Exhibit  lil-l.) 

• As  a general  statement,  the  greater  the  number  of  maps  provided  by  a data 
base  system,  the  greater  is  the  data  independence  achievable. 

• These  mapping  levels  are  the  major  criteria  for  assessing  fhe  data  indepen- 
dence capability  of  a data  base  system. 

• The  first  mapping  level  is  an  Internal  or  physical  definition  map  which  provides 
independence  in  the  program  of  the  particular  access  method  used  or  logical 
record  format  adopted.  It  is  at  a higher  level  than  the  mapping  provided  by 
the  operating  system  but  still  requires  the  program  to  be  aware  of  the  specific 
physical  structure  defined  for  the  data  base. 
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EXHIBIT  1 11-1 


9 


DATA  BASE  MAPPING  LEVELS 
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• The  next  highest  mapping  level  is  the  Conceptual  level  which  serves  as  a 
means  of  transforming  a physical  data  base  structure  into  a logical  or  external 
data  base  structure. 

This  enables  the  application  program  to  be  independent  of  the  physical 
data  storage  structure  and,  in  particular,  enables  the  data  to  be  viewed 
logically  in  ways  quite  different  from  the  physical  storage  of  data. 

For  example,  a conceptual  definition  map  may  enable  data  physically 
stored  as  one  type  of  data  base  structure  to  be  viewed  as  a different 
data  base  structure  (such  as  an  inverted  structure)  without  requiring 
any  physical  move  or  transformation  of  that  data. 

• The  highest  level  is  an  External  or  subset  definition  map  which  enables  the 
application  program  to  view  only  that  part  of  the  data  base  which  is  pertinent 
to  its  needs.  This  map  isolates  the  program  from  knowledge  of  any  other  part 
of  the  data  base  which  is  not  required  by  that  program.  This  has  the 
advantage  that  those  parts  of  the  data  base  whose  existence  is  unknown  to  the 
program  can  change  without  any  impact  on  the  existing  program  logic. 

• Many  DBMS  today  provide  only  two  mapping  levels,  the  Internal  and  External 
map.  The  implication  of  this  is  that  the  application  program,  while  able  to 
view  only  a subset  of  the  data  base  through  the  External  map,  still  must  view 
that  data  in  a way  very  close  to  the  physical  storage  of  the  data.  Therefore,  a 
need  to  change  that  physical  storage  of  data  to  meet  different  application 
requirements  may  have  a serious  impact  on  the  amount  of  program  modifica- 
tion necessary. 

• Generally,  only  where  there  are  three  mapping  levels,  including  the  inter- 
mediate Conceptual  level,  is  the  application  program  able  to  view  data 
logically  in  a way  which  may  not  exist  physically  in  the  data  base.  Transfor- 
mation between  the  physical  view  and  the  logical  view  is  achieved  at  the 
Conceptual  map  level.  Consequently,  provided  the  logical  view  of  the  data 
base  continues  to  be  presented  unchanged  to  the  program,  the  physical  storage 
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of  that  data  base  may  change  in  any  way  to  meet  new  application  require- 
ments. 

FIELD  LEVEL  DATA  INDEPENDENCE 

The  most  important  criterion  in  data  independence  is  the  number  of  mapping 
levels  supported  by  the  particular  data  base  system.  An  additional  factor  in 
assessing  data  independence  is  the  ability  of  a data  base  system  to  offer  data 
independence  at  the  field  level. 

The  advantage  of  field  level  independence  is  that  the  program  may  request 
each  field  by  name  and  is  unaffected  by  changes  in  other  fields  which  it  does 
not  require.  In  particular,  the  addition  of  new  fields  in  a repeating  group 
element  does  not  impact  programs  which  do  not  require  those  fields. 

On  the  other  hand,  a data  base  system  which  uses  as  its  lowest  retrieval  level 
the  repeating  group  element  may  be  affected  by  changes  in  the  fields  within 
that  repeating  group  element. 

One  opportunity  which  field  level  independence  offers  is  the  ability  to 
translate  data  in  a field  from  one  format  (example,  binary)  to  another  format 
(example,  character)  as  field  data  moves  between  the  data  base  system  and 
the  program.  This  enables  the  application  program  to  be  presented  with  data 
in  a form  most  efficient  for  processing  while  the  data  base  system  is  able  to 
store  data  to  achieve  most  efficient  utilization  of  disk  storage. 

Another  factor  to  be  considered  in  field  level  translation  is  the  ability  to 
compress  data  by  the  application  of  various  algorithms  before  that  data  is 
written  to  disk  storage  and  subsequently  to  expand  the  data  on  retrieval  from 
disk. 

An  application  of  this  field  translation  capability  is  in  the  enciphering  and 
deciphering  of  data  between  disk  storage  and  the  program  for  security  reasons, 
to  prevent  unauthorized  access  to  information. 
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Many  data  base  systems  provide  this  data  translation,  compression  and 
expansion  or  enciphering  and  deciphering  automatically  without  any 
action  on  the  part  of  the  application  programmer.  This  approach 
supports  a high  level  of  security. 

Other  data  base  systems,  however,  require  the  application  programmer 
to  participate  in  the  translation  process,  potentially  compromising  data 
security. 

• One  of  the  main  advantages  of  field  level  data  independence  is  the  ability  for 
the  application  program  to  request  fields  by  name  and  be  unaffected  by 
changes  in  other  fields  in  the  data  base,  or  by  the  addition  of  new  fields.  For 
effective  data  independence,  however,  the  application  programmer  is  required 
to  request  each  field  by  name  rather  than  request  groups  of  fields  together. 
Unless  each  field  is  requested  uniquely,  field  level  data  independence  is 
compromised. 

The  implications  of  requesting  each  individual  field  are,  of  course,  the 
additional  processing  overhead  necessary  for  the  data  base  system  to 
identify  each  field  requested  and  retrieve  that  appropriate  field  from 
the  data  base  for  presentation  to  the  program. 

• Some  data  base  systems  provide  a degree  of  compromise  by  enabling  repeating 
group  elements  to  be  defined  as  comprising  only  one  field  or  alternatively 
groups  of  fields. 

In  this  way,  upon  identifying  a repeating  group  field,  the  application 
program  is  presented  with  the  appropriate  field  for  field  level  data 
independence. 

On  the  other  hand,  a group  of  fields  that  will  always  be  used  together 
for  processing  and  that  are  likely  to  be  affected  by  possible  future 
application  data  changes,  can  be  retrieved  by  an  application  program  in 
the  one  repeating  group  occurrence  by  identifying  that  repeating  group. 
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The  levels  of  mapping  and  field  level  definition  capabilities  define  the  data 
independence  potential  of  the  data  base  system.  However,  each  data  base 
system  must  be  evaluated  in  detail  to  determine  the  implications  of  various 
types  of  changes  which  may  be  encountered  in  the  normal  growth  of 
applications  supported  by  a data  base  system. 

The  impact  of  the  most  common  types  of  changes  in  the  data  base  system 
should  be  made  as  a final  consideration  of  data  independence.  (See  Exhibit  III- 
2.) 


The  effect  of  a change  in  the  physical  device  type  used  to  store  the 
data  base  should  be  assessed.  While  this  change  implies  the  need  to 
reload  the  data  base  on  the  new  device,  the  data  base  system  may  also 
require  the  programs  to  be  recompiled  and  perhaps  also  the  program 
logic  to  be  changed. 

Changes  in  specification  of  the  operating  system  access  method  used  by 
the  data  base  may  be  required  for  support  of  new  devices.  For 
example,  when  an  application  is  transferred  from  a batch  processing 
environment  utilizing  a sequential  access  method  to  an  on-line  environ- 
ment, the  data  base  system  should  enable  data  to  be  processed  in  a 
random  fashion.  While  this  change  implies  reloading  of  the  data  base,  it 
may  also  require  recompilation  of  the  programs  and  possibly  a change  in 
program  logic. 

The  effect  of  program  changes  in  the  supported  views  of  the  data  entity 
should  be  assessed  to  determine  the  impact  on  the  data  base  and 
programs.  An  example  would  be  the  changes  necessary  to  incorporate 
an  indexed  view  of  an  entity  that  was  previously  accessed  on  a random 
basis  only. 

The  effect  of  adding  a new  data  entity  of  the  data  base  should  be 
assessed. 
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EXHIBIT  111-2 


DATA  INDEPENDENCE  EVALUATION  CRITERIA 


CHANGE 

ACTION 

CHANGE 

PROGRAM 

LOGIC 

RECOMPILE 

PROGRAM 

RELOAD 

DATA 

BASE 

CHANGE  DEVICE  TYPE 
CHANGE  ACCESS  METHOD 
CHANGE  ENTITY  VIEW 
ADD  NEW  ENTITY 

ADD  NEW  REPEATING  GROUP  TYPE 
ADD  NEW  RELATIONSHIP 
ADD  NEW  FIELD  TO  REPEATING  GROUP 
CHANGE  FIELD  FORMAT 

COMPLETE  FOR  EACH  DBMS  EVALUATED. 
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Similarly,  the  addition  of  a new  repeating  group  type  to  an  existing  data 
entity  should  be  assessed  as  to  its  impact  on  the  logic  of  existing 
application  programs. 

The  addition  of  a new  relationship  to  an  existing  data  entity  may 
impact  existing  programs  whose  logic  depends  upon  the  particular 
physical  organization  existing  when  the  program  was  originally  written. 

The  addition  of  a new  field  to  an  existing  repeating  group  type  may 
require  logic  changes  in  programs  which  are  unable  to  use  field  data 
independence  in  accessing  the  data  base. 

Similarly,  the  ability  to  change  the  field  format  of  elements  within  a 
repeating  group  may  require  program  logic  changes. 


• The  data  independence  criteria  may  be  summarized  as  follows: 

To  provide  effective  data  independence,  a data  base  system  should 
support  as  a minimum  two  levels  of  mapping,  an  internal  definition 
which  maps  the  physical  characteristics  of  the  data  base,  and  an 
external  definition  which  enables  the  program  to  access  only  a subset  of 
that  data  base. 

An  additional  degree  of  data  independence  is  provided  by  a third  level 
of  mapping  (conceptual  definition)  which  enables  a physical  data  base 
structure  to  be  transformed  into  a logical  data  base  structure  as  viewed 
by  the  application  program. 

The  ability  to  request  data  from  a data  base  by  field  provides  a further 
level  of  data  independence.  This,  together  with  field  format  transla- 
tion, compression  and  expansion,  or  enciphering  and  deciphering,  offers 
additional  independence  from  physical  data  storage  considerations. 
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• The  final  evaluation  of  the  data  independence  capability  of  a particular  data 
base  system,  however,  is  the  degree  of  modification  necessary  to  accom- 
modate specific  changes  in  the  data  base,  access  methods,  data  relationships 
or  device  types.  The  impact  of  each  of  these  changes  should  be  assessed  with 
respect  to  the  need  to  reload  the  data  base,  recompile  programs  and  possibly 
change  the  logic  of  programs. 

E.  DATA  INTEGRITY 


• There  are  two  criteria  for  evaluating  the  ability  of  various  data  base  systems 
to  provide  adequate  Data  Integrity.  These  are: 

Exclusive  Control 

Recovery/Restart 

I.  EXCLUSIVE  CONTROL 

• Exclusive  Control  is  a technique  which  a DBMS  uses  to  prevent  two  users  from 
simultaneously  updating  the  same  data. 

It  is  generally  implemented  as  a software  lock-out  mechanism  such  that 
the  first  user  to  request  a record  for  subsequent  update  is  given 
exclusive  control  of  that  record. 

A subsequent  user,  also  wishing  to  update  the  same  record,  is  forced  by 
the  lock-out  mechanism  of  the  DBMS  to  wait  until  the  first  user  has 
completed  its  update  and  replaced  the  updated  record. 

The  data  base  lock-out  mechanism  then  gives  the  second  user  exclusive 
control  of  the  updated  record  and  enables  it  to  carry  out  its  update 
processing. 
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• The  result  of  Exclusive  Control  is  to  ensure  that  no  data  is  lost  and  the  data 
base  integrity  is  not  compromised. 

• While  all  DBMS  which  permit  concurrent  update  activity  should  provide  some 
exclusive  control  lock-out  mechanism,  an  important  evaluation  criterion  is  the 
level  at  which  lock-out  occurs,  and  its  impact  on  performance. 

If  the  lock-out  occurs  at  the  data  base  level,  only  one  data  base  record 
at  a time  may  be  updated  and  any  update  is  complete.  This  may 
significantly  impact  performance  even  though  other  updates  may  be  for 
different  data  base  records  than  the  update  which  has  gained  exclusive 
control  of  the  data  base. 

Another  level  of  lock-out  is  at  the  record  level.  This  is  the  most 
common  level  for  lock-out  and  enables  simultaneous  update  to  proceed 
for  several  users,  each  accessing  different  data  base  records.  However, 
the  lock-out  mechanism  ensures  that  only  one  user  at  a time  is  able  to 
update  the  same  data  base  record. 

The  highest  level  of  lock-out  is  at  the  field  level.  In  this  case,  only 
when  two  users  are  simultaneously  attempting  to  update  the  same  field 
in  the  same  record  does  a lock-out  occur.  Although  this  permits  the 
greatest  possible  degree  of  concurrent  update  capability,  the  software 
control  and  overhead  to  achieve  this  level  of  exclusive  control  can 
sometimes  be  quite  high. 

2.  DATA  INTEGRITY  FOLLOWING  PROGRAM  FAILURE 

• To  ensure  that  the  data  base  can  be  recovered  in  the  event  of  a system  failure, 
the  DBMS  will  normally  direct  both  the  "before"  image  of  each  record  and  the 
"after"  image  (following  an  update  of  a record)  to  a DBMS  log.  The  log  will  be 
used  to  back  out  (rollback)  the  effect  of  partially  processed  programs  in  the 
event  of  system  failure.  This  will  be  discussed  in  more  detail  later  in  Chapter 
III,  Section  4. 
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• In  the  event  of  application  program  failure,  there  are  two  general  exclusive 
control  solutions  to  overcome  the  potential  data  integrity  problems  resulting 
from  the  failure. 

• One  solution  requires  that  in  backing  out  the  processing  for  User  I following  a 
program  failure,  the  DBMS  should  determine  that  this  backout  will  also  result 
in  the  loss  of  User  2 processing.  To  ensure  integrity,  the  DBMS  should  then 
backout  all  other  processing  associated  with  User  2.  In  turn,  backing  out 
additional  processing  for  User  2 may  force  the  backout  of  other  user 
processing  requests. 

This  solution  can  sometimes  result  in  a "backout  cascade"  with  all  data 
base  update  activity  being  backed  out  (perhaps  even  to  the  start  of  the 
day)  in  the  event  of  an  application  program  failure. 

This  is  obviously  not  very  desirable. 

• A second  solution  requires  that  the  DBMS  keep  the  lock-out  mechanism  active 
until  all  associated  processing  for  a particular  user  request  is  completed  and 
program  ends  normally.  Only  then  can  all  data  base  updates  for  that  user  be 
regarded  as  committed  updates. 

In  the  meantime,  other  users  wishing  to  simultaneously  update  the  same 
records  are  kept  waiting  until  the  first  user  finishes  its  processing. 

This  can  have  a severe  impact  on  performance  if  the  lock-out  mechan- 
ism is  at  a low  level  such  as  at  the  data  base.  In  fact,  at  this  level,  it 
implies  single  thread  processing  of  one  user  request  at  a time  with  no 
possibility  of  overlapping  other  user  requests  requiring  access  to 
different  data  base  records. 

However,  if  the  lock-out  mechanism  is  active  at  the  record  occurrence 
level,  other  user  requests  which  update  different  data  base  records  are 
able  to  proceed  simultaneously.  It  is  only  when  two  requests  attempt  to 
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update  the  same  data  base  record  that  the  second  user  request  waits 
until  all  processing  for  the  first  user  has  been  completed. 

This  technique,  sometimes  referred  to  as  "program  isolation,"  ensures 
full  data  integrity  and  protection  of  the  data  base  from  application 
program  failure. 

3.  DEADLOCK  DETECTION  AND  CORRECTION 

• Another  exclusive  control  problem  is  that  of  deadlocks  and  is  illustrated  by  the 
following  example: 

User  I updates  part  132  in  the  data  base.  It  therefore  is  given  exclusive 
control  of  that  record  and  retains  exclusive  control  until  it  has 
completed  its  processing.  At  the  same  time,  User  2 requests  update  of 
a different  record  - part  154  in  the  data  base.  It,  too,  is  given  exclusive 
control  of  that  record  until  it  completes  its  processing.  Both  users  are 
able  to  simultaneously  process. 

However,  if  after  updating  part  132  User  I then  needs  to  update  part 
154,  it  will  wait,  as  User  2 has  already  gained  exclusive  control  of  part 
154,  and  will  only  be  permitted  to  update  part  154  when  User  2 has 
completed  processing. 

If  User  2,  having  updated  part  154,  now  requests  update  of  part  132,  it 
too  will  wait  as  User  I has  already  gained  exclusive  control  of  part  132. 

The  result  is  that  both  users  are  waiting  on  the  other  to  finish 
processing  and  free  the  appropriate  record  from  exclusive  control. 
Neither  user  can  finish  processing  until  the  other  has  freed  its  exclusive 
control  of  the  record  it  needs. 

The  result  is  called  a "deadlock"  and  is  sometimes  referred  to  as  a 
"deadly  embrace." 
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The  solution  to  a deadlock  requires  the  DBMS  to: 


Detect  the  deadlock  condition. 

Break  the  deadlock  by  abnormally  terminating  one  of  the  users. 

• It  is  only  when  one  of  the  competing  users  is  forced  to  terminate  that  the 
other  user  can  gain  exclusive  control  of  the  record  it  needs  and  complete 
processing. 

• Some  DBMS  do  not  attempt  to  identify  and  correct  this  situation  but  instead 
place  responsibility  on  the  DBMS  user  by  indicating  that  requests  for  the  same 
data  base  record  should  be  made  in  the  same  sequence.  This  philosophy  would 
force  both  User  I and  User  2 to  always  request  the  data  base  records  for 
processing  in  the  same  sequence,  say  record  I then  record  2. 

This  may,  however,  place  an  abnormal  restriction  on  DBMS  application 
requirements,  which  may  dictate  that  simultaneously  processed  user 
requests  update  data  base  records  in  different  sequences. 

• Ideally,  the  DBMS  should  detect  the  deadlock  condition  and  then  itself  break 
the  deadlock  by  abnormally  terminating  one  of  the  competing  users.  Because 
this  is  a system-created  abnormal  termination  of  one  of  the  programs,  the 
DBMS  should  then  automatically  recover  the  abnormally  terminated  request. 

This  implies  that  once  User  I has  completed  processing,  the  DBMS 
system  automatically  backs  out  the  partial  processing  carried  out  by 
User  2 up  to  the  time  of  the  deadlock,  and  then  automatically 
reprocesses  User  2's  requests  in  the  sequence  originally  specified. 

This  reprocesssing,  now  that  User  I has  completed,  should  now  proceed 
without  the  creation  of  the  deadlock  condition. 
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Ideally  also,  the  abnormal  termination  of  one  user  request,  its  backout  and 
reprocessing  should  be  transparent  to  the  user  which  made  the  request.  It 
should  be  completely  handled  by  the  DBMS.  The  only  apparent  evidence  of  the 
existence  of  a deadlock  should  be  the  additional  processing  or  response  time 
necessary  to  break  the  deadlock  and  reprocess  the  abnormally  terminated 
request. 


• This  abnormal  termination  and  reprocessing  after  breaking  a deadlock  are  a 
further  extension  of  the  "program  isolation"  technique. 

• In  summary,  exclusive  control  requires  the  DBMS  to  prevent  simultaneous 
update  of  the  same  data  base  record  by  two  or  more  users.  This  is 
implemented  by  a lock-out  mechanism.  One  evaluative  criterion  for  exclusive 
control  is  to  determine  the  lowest  level  at  which  the  lockout  mechanism  is 
active  and  its  impact  on  performance. 


Another  important  exclusive  control  consideration  is  the  ability  for  the  DBMS 
to  ensure  no  loss  of  data  integrity  in  the  event  of  abnormal  termination  of  a 
particular  program. 


This  is  sometimes  referred  to  as  "program  isolation." 

It  is  also  used  with  deadlock  detection  and  resolution  by  abnormally 
terminating  one  of  two  or  more  deadlocked  user  requests,  backing  out 
the  terminated  request  and  reprocessing  that  backed  out  request. 

• Important  considerations  in  evaluating  a DBMS  on  the  basis  of  these  criteria 
are  to  identify  who  is  responsible  - the  DBMS  itself  or  the  user  of  the  DBMS 
for: 


Establishing  exclusive  control. 
Ensuring  program  isolation. 
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Detecting  and  breaking  deadlocks. 


For  full  data  integrity,  these  exclusive  control  considerations  must  be  sup- 
ported. An  important  evaluation  consideration  then  is  to  determine  whether 
this  requires  user-written  code,  or  whether  it  is  standardly  supported  by  the 
DBMS. 

RECOVERY  AND  RESTART 

The  importance  of  recovery  and  restart  capabilities  for  a DBMS  are  well 
recognized. 

Recovery  applies  to  the  ability  to  rebuild  a damaged  data  base  to  a designated 
point  from  a past  copy. 

Recovery  is  generally  necessary  when  the  data  base  has  been  physically 
damaged,  as  indicated  by  an  unrecoverable  I/O  error  and  the  inability  to 
read  or  write  data  on  a particular  track  or  section  of  a disk  storage 
drive. 

In  this  case,  recovery  implies  the  reconstruction  of  the  data  base  using 
an  earlier  copy  of  the  data  base  and  updating  it  with  all  activity  which 
occurred  to  that  data  base  since  the  copy  was  taken. 

Restart  applies  to  the  ability  to  re-establish  the  system  after  an  interruption 
due  to  hardware  or  software  failure. 

Generally,  restart  does  not  require  recovery  of  the  data  base  as 
described  above  unless  the  hardware  failure  was  due  to  a disk  error. 

Instead,  it  requires  the  identification  of  all  processing  which  had  not 
completed  at  the  time  of  the  failure. 
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This  processing  generally  should  be  backed  out  (rolled  back)  to  some 
predefined  point  and  then  possibly  reprocessed  again  to  completion. 

• Both  recovery  and  restart  require  that  all  data  base  activity  be  recorded  on  a 
DBMS  system  log.  This  log  contains  two  types  of  records,  the  before  image  of 
a data  base  record  prior  to  an  update  and  the  after  image  of  the  record 
following  an  update. 

• The  after  image  of  a record  is  used  during  recovery  to  reconstruct  from  a 
previous  copy  of  the  data  base.  This  update  activity  may  involve  addition  of 
new  data  base  records,  deletion  of  records  or  update  of  information  in  existing 
records. 

The  after  image  of  an  addition  is  the  new  record  which  has  been  added. 

The  after  image  of  a deletion  is  an  indication  of  the  identity  of  the 
record  which  was  deleted. 

The  indication  of  an  update  is  the  image  of  the  record  following 
completion  of  the  update. 

• The  before  image,  prior  to  update  activity,  is  used  to  backout  data  base 
processing.  This  backout  may  involve  the  deletion  of  a record  which  had 
previously  been  added  to  a data  base,  the  addition  of  a record  which  had 
previously  been  deleted  from  the  data  base,  or  the  replacement  of  an  updated 
record  by  the  before  image  of  that  record  prior  to  the  update.  The  end  result 
of  this  backout  is  to  restore  each  data  base  record  to  its  status  prior  to 
initiation  of  the  processing  which  is  being  backed  out. 
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5. 


RECOVERY  CRITERIA 


• The  criteria  for  judging  the  recovery  capabilities  of  the  DBMS  are: 

Who  has  responsibility  for  logging  or  recording  an  audit  trail  of  all  after 
images?  Ideally,  this  log  or  audit  trail  should  be  automatically  created 
by  the  DBMS  without  any  special  effort  on  the  part  of  user  programs. 
A DBMS  which  requires  user  requests  or  user  control  to  cause  this 
logging  implies  not  only  additional  programming  to  provide  recovery  but 
also  additional  program  maintenance. 

Recovery  of  a data  base  from  a prior  copy  is  generally  a utility  type 
function.  As  a minimum,  the  DBMS  system  should  provide  a copy  and 
restore  utility  to  produce  a copy  of  the  data  base  periodically  and  then 
restore  the  data  base  to  a particular  status  by  applying  all  after  images 
on  the  log  to  that  copy. 

An  additional  level  of  utility  support  may  provide  for  the  summariza- 
tion of  log  or  audit  trail  records.  This  is  a second  criterion  for  recovery 
and  ensures  that  only  the  most  recent  version  of  each  data  base  record 
is  used  for  the  recovery  process.  This  has  the  effect  of  significantly 
reducing  the  recovery  time  by  avoiding  redundant  processing  of  data 
base  records  which  are  subsequently  further  updated  by  later  activity 
during  the  recovery  process. 

A third  criterion  is  the  identification  of  the  smallest  recoverable  unit. 
Some  DBMS  force  recovery  of  the  entire  data  base  in  the  event  of 
physical  damage  to  any  part  of  that  data  base.  The  extent  of  recovery 
necessary  should  be  defined  as  either  the  data  base,  a data  set  (or  file) 
of  the  data  base,  a disk  track,  or  a physical  record  within  a disk  track. 
The  amount  of  the  data  base  which  must  be  recovered,  of  course,  has  a 
direct  effect  on  the  recovery  time  necessary. 
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6. 


BATCH  RESTART  CRITERIA 


9 


The  criteria  for  restarting  batch  processing  following  a hardware  or  software 
failure  are: 


The  responsibility  for  logging  before  images  of  data  base  activity  should 
be  identified.  Ideally,  this  should  be  a DBMS  system-provided  function 
requiring  no  participation  on  the  part  of  the  user's  program. 

The  utility  support  necessary  to  accomplish  this  backout  should  be 
identified.  Again,  this  should  be  a DBMS  system-provided  utility  which 
requires  no  additional  user  programming. 


To  ensure  that  a data  base  is  restarted  in  the  shortest  possible  time  it 
should  be  possible  to  identify  restart  points  within  a batch  program. 
These  restart  points  define  periodic  synchronization  points  in  a DBMS 
system  log  and  enable  backout  of  all  processing  up  to  that  restart  point 
to  be  carried  out  by  the  backout  utility.  These  restart  points  are 
sometimes  referred  to  as  "checkpoints." 


7.  ON-LINE  RESTART  CRITERIA 


• The  request  of  an  on-line  program  requires  two  types  of  logging: 

First,  there  is  the  logging  necessary  to  reflect  all  data  base  activity  and 
possible  back  out  of  partially  processed  on-line  transactions. 


Secondly,  since  an  on-line  environment  is  characterized  often  by  the 
capture  of  transactions  and  data  at  the  point  of  origin,  there  may  not 
be  any  record  of  that  transaction  maintained  in  hard-copy  form  for 
subsequent  manual  reprocessing  or  re-entry  of  transactions  on  restart. 
Accordingly,  the  DBMS  should  provide  for  message  logging  such  that 
after  backing  out  partially  completed  processing  of  transactions,  a copy 
of  the  originally  entered  transaction  may  be  retrieved  from  a message 
log  and  used  to  reprocess  the  transaction  in  its  entirety. 
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• One  restart  criterion  identifies  the  responsiblity  for  on-line  message  logging  - 
whether  it  is  provided  automatically  by  the  DBMS  or  whether  it  requires  user 
programming. 

• Another  criterion  is  whether  messages  are  recorded  on  the  same  physical  log 
as  data  base  activity.  Provided  the  same  log  is  used  for  both  message  and  data 
base  logging,  full  data  integrity  can  be  assured. 

If,  however,  a different  physical  log  is  used  for  messages  to  that  used 
for  data  base  activity,  a possible  "window"  can  occur  when  message 
activity  is  recorded  on  one  log  but  there  has  not  been  sufficient  time  to 
record  data  base  activity  on  the  other  log  (or  vice  versa). 

While  the  sychronization  of  these  two  logs  can  sometimes  be  achieved 
by  the  user,  to  ensure  no  loss  of  integrity  this  synchronization  should  be 
a DBMS  provided  function  and  should  not  be  the  responsibility  of  the 
user  programmer. 

8.  SYNCHRONIZATION  POINTS 

• Another  important  factor  is  the  ability  of  the  user  program  to  define  restart 
points  (or  "synchronization  points")  during  the  processing  of  the  transaction. 
For  example,  a complete  on-line  transaction  may  involve  the  entry  and 
processing  of  all  line  items  comprising  an  order.  Each  line  item  is  entered  as  a 
separate  on-line  transaction,  all  transactions  then  comprising  an  order. 

• In  the  event  of  a system  failure,  two  possibilities  arise: 

The  application  may  require  that  all  line  items  be  backed  out  to  the 
start  of  the  order.  This  case  may  arise  where  the  application  is  such 
that  if  an  order  cannot  be  completely  handled  at  the  time  it  is  placed 
by  a customer,  that  entire  order  may  be  cancelled  and  should  not  be 
reprocessed  on  system  restart. 
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On  the  other  hand,  the  application  may  require  that  on  system  restart 
the  order  be  continued  from  the  last  line  item  entered  and  accepted  by 
the  system.  In  this  case,  restart  should  involve  backout  only  of  any 
incomplete  processing  of  the  last  entered  transaction. 

• For  most  effective  use,  the  DBMS  should  enable  the  user  program  to  define 
restart  points  in  an  on-line  program. 

One  restart  point  may  be  at  the  start  of  an  on-line  program  (ensuring 
that  the  entire  program  activity  and  all  transactions  are  backed  out  on 
system  restart). 

Alternatively,  the  need  may  exist  to  define  additional,  intermediate 
restart  points  such  as  immediately  prior  to  requesting  each  subsequent 
line  item  transaction  from  the  terminal.  In  this  way,  backout  will  occur 
only  to  the  last  entered  transaction. 

• The  inability  of  a DBMS  to  offer  the  user  this  flexibility  of  defining  restart 
points  may  force  significantly  more  reprocessing  of  on-line  transactions  if 
restart  requires  the  full  backout  of  all  previous  processing  in  a partially 
completed  program  at  the  time  of  a failure. 

• An  additional  on-line  restart  criterion  is  the  ability  to  provide  a system  wide 
restart,  re-establishing  the  system  to  some  predefined  point.  This  may  be  a 
"warm  start"  capability  which  enables  the  on-line  system  to  periodically 
record  its  status  (similar  to  a batch  program  checkpoint)  and  restart  the 
system  at  that  point.  This  generally  involves  the  quiescing  of  all  on-line 
activity  to  enable  such  a "checkpoint"  to  be  taken. 

• This  quiescing  or  "quiet  point"  may  have  an  impact  on  on-line  response  time 
and  performance  depending  on  the  frequency  of  checkpoints  and  the  duration 
of  the  quiesce  period. 
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In  addition,  a DBMS  which  supports  only  quiesced  checkpoints  for  restart  and 
does  not  allow  restart  points  to  be  defined  within  a program  will  result  in 
significantly  more  reprocessing  of  transactions  once  the  system  has  been  re- 
established. 

A further  on-line  restart  criterion  is  the  ability  to  restart  individual  tasks. 
The  ability  to  backout  and  restart  at  the  task  level  is  very  important.  In  this 
way,  specific  task  processing  may  be  removed  from  the  system  without 
requiring  a complete  system  restart.  The  result  is  increased  on-line  system 
availability. 

The  final  criterion  in  on-line  restart  concerns  the  handling  of  message 
reprocessing  following  a system  restart.  As  discussed  earlier,  a DBMS  should 
ensure  that  all  input  messages  are  logged  and  possibly  ensure  that  all  output 
messages  are  also  logged. 

Logged  input  messages  can  be  used  for  reprocessing  transactions  that  have 
been  backed  out  during  system  restart.  A significant  consideration  in  this 
reprocessing  is  to  ensure  that  transactions  are  reprocessed  in  exactly  the  same 
sequence  as  they  had  been  processed  originally.  There  are  two  ways  in  which 
this  reprocessing  can  be  achieved: 

Record  all  the  necessary  processing  events  on  the  system  log  or  audit 
tape  and  reprocess  those  events  from  the  system  log  at  restart  time,  or 

Recreate  the  various  processing  events  by  recording  on-line  messages 
on  the  system  log,  followed  by  reprocessing  the  transactions  which  were 
received  between  the  particular  checkpoint  and  the  failure. 

The  first  approach  uses  the  system  log  to  ensure  that  all  activity  is 
reprocessed  in  exactly  the  same  chronological  sequence  as  it  originally 
occurred.  In  this  way,  the  exact  processing  conditions  can  be  recreated  on 
system  restart. 
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• In  the  second  case,  however,  transactions  recorded  on  a system  log  appear 
serially  on  that  log  and  do  not  necessarily  reflect  their  time  rate  of  arrival. 
Thus,  two  transactions  may  originally  have  been  received  almost  simulta- 
neously and  may  have  originally  been  processed  concurrently.  On  the  other 
hand,  another  two  transactions  on  the  system  log  may  have  been  received  with 
a considerable  time  in  between,  such  that  the  first  transaction  was  completely 
processed  before  the  second  transaction  started.  In  this  case,  concurrent 
processing  would  not  have  occurred  originally  and  the  different  "arrival  rate" 
of  reprocessed  transactions  may  affect  the  processing  logic.  This  is  illustrated 
in  the  following  example: 

A transaction  was  processed  to  reflect  the  receipt  of  additional 
quantities  of  a particular  part  into  a warehouse.  The  processing  of  that 
transaction  resulted  in  that  quantity  being  added  to  the  balance  on 
hand. 

Simultaneously  with  the  processing  of  the  inventory  receipt,  a customer 
order  was  received  against  that  part. 

Assuming  that  the  order  was  processed  concurrently  with  the  receipt,  if 
the  test  for  parts  availability  was  made  fractionally  after  the  receipt 
had  increased  the  balance  on  hand,  sufficient  quantity  is  reflected  in 
inventory  to  fill  the  order.  The  result  of  these  relative  arrival  rates  is 
the  production  of  the  packing  slip  and  the  update  of  accounts  receivable 
information  for  that  customer.  The  packing  slip  may  then  have  been 
transmitted  to  a printer  in  the  warehouse  prior  to  a system  failure. 

Following  a system  failure,  incompleted  processing  would  be  backed 
out.  Thus,  if  the  receipt  processing  and  customer  order  processing  had 
not  fully  completed  at  the  time  of  the  failure,  the  addition  of  the 
inventory  receipt  to  the  balance  on  hand  will  be  backed  out,  as  also  will 
the  updating  of  the  balance  on  hand  and  accounts  receivable  informa- 
tion reflecting  the  placement  of  the  order. 
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The  problem  arises  in  reprocessing  the  transaction. 

Because  of  the  time  interactions  in  a concurrent  multiprocessing  or 
multitasking  environment,  it  may  be  possible  that  the  customer  order 
test  for  parts  availability  on  reprocessing  is  made  before  the  inventory 
receipt  reprocessing  increments  the  balance  on  hand.  (This  could  occur 
with  slightly  different  relative  transaction  arrival  rates.)  At  this  time 
the  inventory  record  reflects  insufficient  quantity  on  hand  to  fill  the 
order,  the  result  being  the  generation  of  a back  order  for  that 
customer. 

Here  now,  a completely  different  result  has  occurred  through  reproces- 
sing. But  more  importantly  the  warehouse  storeman  may  not  have  been 
aware  of  the  failure  of  the  system  and  may  have  started  picking  the 
goods  from  the  earlier  produced  packing  slip  (prior  to  system  failure). 
This,  of  course,  may  be  shipped  to  the  customer  but  now  no  record  of 
that  shipment  is  reflected  in  the  customer's  accounts  receivable  details 
as  this  record  was  backed  out  on  system  restart  after  the  failure. 
Furthermore,  the  original  order  was  back  ordered  on  reprocessing  after 
system  restart  and  will  be  supplied  to  the  customer  at  a later  time. 

This  example  serves  to  illustrate  two  other  restart  criteria: 

First,  the  ability  to  ensure  the  reprocessing  of  transactions  in  exactly 
the  same  sequence  as  originally  occurred. 

Second,  and  more  importantly,  it  identifies  the  need  to  control  the 
delivery  of  output  messages  reflecting  processing  of  transactions. 

Ideally,  the  DBMS  should  ensure  that  the  output  message  (the  packing  slip  in 
our  example)  is  not  transmitted  until  all  processing  is  completed  and  the  data 
base  activity  has  been  committed  and  cannot  be  backed  out  in  the  event  of  a 
subsequent  hardware  or  software  failure. 
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F. 


DATA  SECURITY 


• Data  security  relates  to  the  protection  of  the  data  resource  from  unauthorized 
use.  In  effect,  data  security  implies  a mechanism  to  control  the  access  to 
data. 

• Two  levels  of  security  control  are  required  in  the  data  base/data  communica- 
tion environment: 

The  first  level  of  security  controls  the  ability  of  on-line  terminals  to 
access  application  programs.  This  level  determines  which  users  have 
access  to  which  transaction  programs  and  is  implemented  in  the  data 
communication  portion  of  a DBMS. 

The  second  level  of  security  is  applied  between  the  application  program 
and  the  data  base.  This  level  of  security  controls  that  portion  of  the 
data  base  to  be  made  available  to  a particular  application  program  and 
also  controls  the  operations  which  can  be  performed  on  that  data.  This 
level  of  security  is  provided  by  the  DBMS. 

• The  ability  to  control  the  access  to  data  requires  both  an  access  restriction 
mechanism  as  well  as  the  ability  to  enforce  that  restriction.  In  evaluating  the 
security  criteria  of  a DBMS,  the  restriction  mechanism  and  the  enforceability 
of  that  restriction  mechanism  must  be  addressed. 

• Several  different  types  of  access  mechanisms  are  available  in  DBMS  including: 

Passwords. 

Security  levels. 

Specific  permission. 
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• The  "password"  approach  requires  the  knowledge  of  a key  or  password  in  order 
to  gain  access  to  a piece  of  data  for  which  a lock  has  been  defined.  It  is  an 
analogous  to  the  combination  of  a safe,  whereby  only  those  people  with  a 
knowledge  of  the  combination  are  able  to  access  the  contents  of  the  safe.  The 
lock  on  the  safe  becomes  the  restriction  mechanism. 

• A "security  level"  approach  applies  a security  value  to  the  various  elements  to 
the  protected.  Users  are  then  provided  different  levels  of  security  which 
enable  them  to  access  all  elements  with  the  security  value  equal  to,  or  perhaps 
below,  their  particular  security  levels.  Thus,  once  a user  satisfies  access  to  all 
elements  of  data  at  a given  security  level,  he  can  access  all  elements  of  data 
at  that  security  level.  This  is  equivalent  to  a master  key  which  allows  access 
to  all  rooms  fitted  with  a lock  operated  by  that  master  key. 

• "Specific  permission"  differs  from  the  previous  approaches  in  that  each 
individual  task  is  granted  unique  permission  to  access  an  element  of  a data 
base.  That  permission  is  valid  for  that  task  only.  Separate  permission  must  be 
obtained  for  a different  task  if  it  is  desired  to  access  the  same  or  different 
data  at  another  time.  This  is  analogous  to  a guard  at  the  door,  specifically 
authorizing  entry  to  a room  each  time  an  individual  approaches  the  door. 

• Consideration  must  be  given  to  the  level  to  which  the  security  restriction 
applies.  Restriction  mechanisms  can  apply  to  the: 

Data  base. 

Data  set  or  file. 

Record  type. 

Field. 

• Generally,  restriction  at  the  field  level  offers  a greater  degree  of  security 
than  at  the  data  base  level.  Once  a user  is  given  access  to  the  data  base,  he  is 
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able  to  view  and  operate  on  all  parts  of  the  data  base  without  restriction.  The 
field  level  restriction  requires  the  user  to  satisfy  the  security  requirements  of 
each  individual  field  before  he  is  given  access  to  that  field. 

• Another  consideration  is  the  processing  options  permitted  once  access  is 
granted  to  data.  The  ability  to  distinguish  between  read  access,  write  access, 
addition  and  deletion  access  enables  additional  levels  of  security  to  be 
controlled.  Thus,  only  a small  subset  of  all  of  the  people  permitted  to  read 
(view)  data  may  be  permitted  to  write  (update)  that  data. 

• The  final  criterion  regarding  data  security  is  the  responsibility  for  enforce- 
ment of  data  security. 

For  example,  if  the  programmer  provides  the  password,  he  therefore 
must  know  the  password.  Therefore,  he  can  use  the  password  from  a 
program  which  should  not  have  access  to  a given  piece  of  data. 

Additionally,  in  the  specific  permission  approach,  for  example,  if  the 
programmer  is  able  to  decide  what  elements  he  will  access,  there  is 
essentially  no  way  to  enforce  the  security  of  the  system. 

• The  latter  aproach  is  a decision  made  at  the  time  of  writing  the  program 
whereby  the  programmer  is  given  permission  to  access  various  elements  of 
data.  Once  that  permission  is  given,  unless  the  data  base  system  provides 
additional  levels  of  enforceability,  there  is  no  way  to  further  enforce  the 
security  of  the  system. 

• The  DBMS  may  control  this  enforceability  by  ensuring  that  the  program  is  only 
able  to  access  a defined  subset  of  the  data  base,  and  then  only  access  it  in 
certain  defined  ways;  for  example,  in  read  only  mode,  but  not  update  mode. 

• Additionally,  the  use  of  passwords  enables  those  passwords  to  be  changed  from 
time  to  time  to  ensure  that  security  is  not  violated.  This  approach  is  generally 
most  applicable  in  associating  passwords  with  on-line  transactions  and  pro- 
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grams  such  that  only  those  users  with  the  knowledge  of  the  current  password 
are  able  to  access  the  appropriate  programs  or  data  bases. 


G.  DBMS  EASE  OF  USE 


I.  USER  DEFINITION 

• The  first  group  of  users  of  a DBMS  system  are  the  end  users.  These  are 
normally  user  department  personnel  who  typically  use  a DBMS  to  carry  out 
their  various  assigned  responsibilities.  This  class  of  user  is  generally  large  in 
number  but  with  a low  DBMS  skill  level,  as  shown  in  Exhibit  1 1 1-3. 

• The  second  class  of  user  is  the  application  programmer.  These  naturally  are 
fewer  in  number  than  the  user  department  personnel  but  require  a higher  skill 
level  sufficient  to  enable  them  to  utilize  the  DBMS  services  in  coding  various 
application  programs. 

• The  third  class  of  user  is  very  small  in  number  with  a high  degree  of  skill.  It  is 
this  class  of  user  who  designs,  organizes  and  controls  the  DBMS  for  use  by 
application  programmers  and  user  department  personnel.  This  class  of  user  is 
referred  to  as  a Data  Base  Administrator  and/or  a data  communications 
administrator. 

• A DBMS  system  should  be  easy  to  use  by  each  of  the  three  classes  of  users. 
However,  ease  of  use  is  in  varying  degrees  dependent  upon  the  complexity  of 
the  function. 

• Ease  of  use  for  the  user  department  personnel  relates  generally  to  the  ease  of 
utilizing  on-line  terminals  and  application  programs  to  access  information  in 
the  data  base.  It  also  refers  to  the  ability  for  user  department  personnel  to 
retrieve  information  from  the  data  base  to  satisfy  spontaneous  or  ad-hoc 
requests  for  information.  The  ability  to  easily  obtain  information  from  a data 
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EXHIBIT  II 1-3 


O 


RELATIONSHIP  OF  DBMS  USER  GROUPS  AND  SKILL  LEVEL 


DBA  APPLICATION  END 
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base  is  aided  by  a query  language  which  may  be  very  similar  to  English  and 
which  ideally  requires  no  prior  programming  experience  to  use. 

• The  second  class  of  user,  the  application  programmer,  is  concerned  with  ease 
of  use  of  a DBMS  such  that  he  can  concentrate  on  the  application  logic 
requirements  in  coding  programs  without  being  concerned  about  physical  data 
base  organization,  storage  considerations,  physical  terminal  or  network  con- 
siderations. 

• The  Data  Base  Administrator  class  of  user  has  the  full  responsibility  for 
organization  and  control  of  the  data  base  and  requires  a number  of  tools  to 
assist  him  in  carrying  out  his  function.  (Similarly,  the  data  communications 
administrator  has  full  responsibility  for  organization  and  control  of  terminals 
and  data  communications  network  facilities.) 

2.  EASE  OF  USE  FOR  THE  DATA  BASE  ADMINISTRATOR 

• Initial  installation  and  growth  are  two  areas  not  necessarily  complementary  as 
far  as  ease  of  use  for  the  Data  Base  Administrator  is  concerned.  In  fact,  many 
DBMS  available  today  stress  ease  of  installation  over  ease  of  growth: 

Compared  with  a non-DBMS  approach,  such  systems  may  enable  an 
installation  to  grow  significantly  faster  than  would  otherwise  be  pos- 
sible without  DBMS. 

However,  unless  the  DBMS  allows  easy  growth  into  new  application 
areas,  by  the  addition  of  new  data  to  the  data  base  without  causing 
disruption  of  existing  data  and  application  programs,  ease  of  initial 
installation  may  impede  subsequent  growth. 

• Thus,  a DBMS  which  may  perhaps  involve  more  effort  in  initial  installation 
may  represent  considerably  more  long-term  ease  of  use  than  one  which 
permits  easy  installation  but  introduces  later  problems  in  the  ability  to  grow. 
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A DBMS  is  a significant  investment  by  any  company.  Short  term  gains  should 
not  be  accepted  if  they  may  introduce  long-term  growth  problems.  The 
objective  of  the  DBMS  system  is  to  permit  easy  long-term  application  growth. 


A DBMS  may  provide  a number  of  tools  which  ease  the  effort  necessary  during 
initial  installation  and  subsequent  growth.  Some  of  the  important  Data  Base 
Administrator  tools  which  should  be  addressed  by  a DBMS  system  are 
identified  in  Exhibit  111-4. 


The  first  task  of  the  Data  Base  Administrator  is  to  design  the  data  base. 
Depending  on  the  complexity  of  the  initial  data  base,  a design  evaluation  aid 
may  be  an  ease  of  use  tool.  However,  the  availability  of  such  design 
evaluation  becomes  far  more  important  as  the  data  base  grows  in  scope  and 
complexity.  The  various  design  alternatives  need  to  be  assessed  to  determine 
their  impact  not  only  on  the  future  data  base  system  but  also  to  determine  the 
impact  when  existing  data  base  applications  move  to  the  future  data  base 
environment. 

The  Data  Base  Administrator  defines  the  data  base  by  means  of  a data 
definition  language.  In  initial  installation  of  a data  base  there  will  be  a 
learning  factor  to  consider  while  the  Data  Base  Administrator  beomes  familiar 
with  the  data  definition  language.  Thus  a data  definition  language  that  is  easy 
to  understand  becomes  an  important  factor  during  initial  installation.  The 
importance  becomes  less  once  the  Data  Base  Administrator  has  gained 
proficiency. 

Another  consideration  regarding  the  data  definition  language  is  the  degree  of 
function  provided  by  the  data  base  system.  A greater  degree  of  function 
provided  by  a data  base  system  than  is  needed  initially  may  perhaps  at  first  be 
a disadvantage  in  that  it  introduces  extra  complexity  during  the  learning  time 
of  the  Data  Base  Administrator.  However,  the  additional  functions  may  well 
be  one  of  the  most  significant  factors  in  permitting  easy  data  base  application 
growth  later  in  the  installation,  at  which  point  it  becomes  far  more  important 
as  an  ease  of  use  factor. 
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EXHIBIT  111-4 


DATA  BASE  ADMINISTRATOR  TOOLS 


TOOL 

IMPORTANCE 

INITIAL 

INSTALLATION 

DATA  BASE 
GROWTH 

DESIGN  EVALUATION 

1 

M 

DATA  DEFINITION  LANGUAGE  TYPE 

M 

1 

DATA  DEFINITION  LANGUAGE  FUNCTION 

L 

M 

PERFORMANCE  MEASUREMENT 

L 

M 

DOCUMENTATION  & CONTROL 

1 

M 

CONVERSION 

M 

1 

DATA  BASE  RESTRUCTURING 

1 

M 

EDUCATION 

1 

1 

DOCUMENTATION 

1 

1 

(LEGEND:  l-IMPORTANT:  M-MOST  IMPORTANT:  L-LEAST  IMPORTANT) 
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• The  more  functions  provided  by  a DBMS,  the  less  may  need  to  be  developed  by 
the  installation  to  support  additional  capabilities  beyond  the  scope  of  the 
system.  At  this  time  full  functional  support  becomes  a very  positive  ease  of 
use  factor. 

• The  next  Data  Base  Administrator  ease  of  use  factor  is  the  facility  for 
performance  measurement.  Performance  measurment  is  a capability  which 
may  be  useful  at  initial  installation  time  to  estimate  or  measure  the 
performance  of  the  initial  data  base  and  data  communication  design.  It 
becomes  a very  significant  factor  for  the  maintenance  of  an  efficient  level  of 
operation  and  performance  as  the  demands  on  the  system  grow  and  change. 

• Perhaps  the  most  significant  task  of  the  Data  Base  Administrator  is  in  the 
area  of  documentation  and  control  of  the  data  base.  The  problems  involved 
expand  in  direct  proportion  to  the  size  and  use  of  the  data  base.  In  this  area, 
the  availability  of  a data  dictionary  becomes  almost  essential  to  the  effective 
performance  of  this  task. 

• A data  dictionary  allows  the  computer  to  maintain  control  of  all  data  elements 
in  a data  base  and  identify  all  applications,  programs  and  components  which 
use  specific  data.  While  this  may  not  be  a significant  control  problem  during 
initial  installation  unless  the  DBMS  is  developed  from  the  outset  with  a data 
dictionary  in  mind,  the  interacting  problems  resulting  from  change  and  growth 
of  that  DBMS  installation  and  its  applications  may  severely  inhibit  the  ability 
to  grow. 

Without  some  data  dictionary  capability,  either  manual  or  automated, 
problems  arise  as  the  data  base  grows  due  to  changes  being  made 
without  knowledge  of  all  of  the  various  personnel  impacted  by  that 
change. 

A data  dictionary  enables  all  organizational  elements  to  be  identified 
prior  to  a change  being  made  and  thus  minimizes  non-productive 
changes. 
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• One  of  the  initial  tasks  for  the  Data  Base  Administrator  is  often  to  convert 
existing  applications  to  a DBMS  environment.  Any  aids  which  ease  this 
conversion  will  be  particularly  important  initially. 

• As  the  data  base  grows,  the  need  to  restructure  that  data  base  will  also 
become  more  significant.  Utilities  or  tools  to  assist  in  data  base  restructuring 
make  growth  easier  for  the  Data  Base  Administrator. 

• Finally  , the  DBMS  system  should  provide  good  education  and  documentation 
facilities  to  ensure  full  ease  of  use  of  the  system  as  provided. 

3.  EASE  OF  USE  FOR  THE  APPLICATION  PROGRAMMER 

• The  application  programmer  has  a number  of  primary  tasks  in  which  ease  of 
use  facilities  will  apply.  These  tasks  generally  include  initial  coding,  testing 
and  debugging,  and  program  maintenance. 

• Therefore,  application  programmer  ease  of  use  tools  should  exist  in  the 
following  areas: 

Data  Manipulation  Language:  The  data  manipulation  language  is  the 

language  used  by  the  application  programmer  to  request  information 
from  the  DBMS. 

Testing  and  Debugging  Aids:  Testing  and  debugging  aids  are  tools  which 
can  assist  the  application  programmer  in  identifying  faulty  logic  within 
the  program. 

Degree  of  Independence:  Data  independence  has  been  previously 

discussed  and  has  a very  significant  effect  on  the  amount  of  program 
maintenance  which  may  be  required  on  the  part  of  the  application 
programmer,  as  the  system  grows  and  new  applications  are  added. 
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• The  data  manipulation  language  should  exhibit  simplicity  for  the  application 
programmer.  This  simplicity  can  be  indicated  by  the  language  type  and  the 
total  number  of  operations. 

The  language  type  may  be  procedural  with  verbs  requesting  data  base 
operations  or  a CALL  type  language  with  data  base  calls  invoking 
various  data  base  functions.  Both  types  are  acceptable  but  should  be 
able  to  be  used  from  a high  level  language  such  as  COBOL  or  PL/I. 

The  total  number  of  operations  support  by  a DBMS  can  be  an  indication 
of  the  level  of  support  provided  by  that  system.  Generally,  a higher 
level  DBMS  will  demand  fewer  different  types  of  operations  from  the 
application  programmer  than  from  a lower  level  system. 

• A data  manipulation  language  that  provides  different  operation  types  for  use 
by  the  application  programmer  in  retrieving  different  data  from  the  data  base 
may  imply  a need  for  greater  knowledge  of  the  data  base  structure  than  a data 
manipulation  language  which  offers  a high  degree  of  function. 

• Generally,  the  more  different  operation  types  an  application  programmer  must 
know  to  use  a DBMS,  the  more  the  application  program  becomes  tied  to  the 
particular  DBMS  design.  The  implication  of  this  is  possibly  a greater  need  to 
modify  the  application  programs  if  that  design  must  change. 

• On  the  other  hand,  a data  manipulation  language  which  provides  a higher  level 
of  support  by  requiring  the  application  programmer  to  use  fewer  operation 
types  generally  will  be  less  dependent  upon  the  particular  data  base  design  and 
perhaps  less  affected  by  possible  future  changes  made  to  the  data  base  for 
application  reasons. 

• To  evaluate  the  power  of  the  data  manipulation  language,  a number  of  criteria 
can  be  considered: 
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Number  of  repeating  group  types  that  can  be  processed  in  one  data  base 
request. 

Data  search  capabilities  of  the  data  manipulation  language. 

Facilities  for  processing  multiple  record  types  for  each  data  base 
command  as  opposed  to  one  which  only  processes  a single  record  type 
for  each  command. 

• In  evaluating  the  data  search  capabilities,  the  ability  for  the  application 
programmer  to  specify  high-low-equal  search  logic,  Boolean  search  logic  to 
combine  a number  of  related  high-low-equal  search  requests,  and  the  ability  to 
apply  search  requests  across  multiple  record  types  are  significant  in  easing  the 
application  programming  burden.  Such  a capability  in  a data  manipulation 
language  can  indicate  a high  level  of  function  in  the  DBMS  system  and  can 
release  the  application  programmer  from  the  need  to  include  program  logic  to 
locate  and  check  for  the  presence  of  the  various  conditions.  This  is  a function 
which  should  be  carried  out  by  the  DBMS  on  request  from  the  program. 

4.  EASE  OF  USE  FOR  THE  END  USER 

• The  end  users,  who  are  generally  the  largest  class  of  users  with  the  lowest  skill 
level,  should  be  able  to  utilize  the  DBMS  easily  without  any  special  training. 
The  existence  of  both  batch  and  terminal  capabilities  in  a DBMS  can  enable 
end  users  to  carry  out  their  functions  against  the  data  base  using  on-line 
terminals.  The  existence  of  query  languages  (which  enable  the  end  user  to 
pose  ad-hoc  queries)  is  also  a significant  ease  of  use  factor. 

• Finally,  the  availability  of  application  packages  that  address  common  user 
department  requirements  is  an  important  consideration.  For  example,  applica- 
tion packages  that  address  text  processing,  information  retrieval  and  business 
planning,  can  help  satisfy  the  requirements  of  many  different  industries. 
Further,  the  DBMS  may  provide  application  support  for  specific  industries. 
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H.  COST  FACTORS 


o 


• It  is  very  rare  that  in  evaluating  two  or  more  DBMS  they  all  provide  the  same 
degree  of  function,  data  independence,  integrity,  security  and  ease  of  use. 
Yet,  an  effective  evaluation  must  consider  each  of  these  factors. 

• This  implies  that  some  estimate  must  be  made  of  the  intangible  costs  which 
may  be  associated  with  certain  missing  functions  in  one  DBMS  system  when 
compared  to  another. 

I.  INTANGIBLE  COSTS 


• Intangible  costs  from  limitations  in: 


Function  may  result  in  additional  expenditures  due  to  longer  program 
development  time,  as  that  function  may  need  to  be  coded  by  the 
installation's  programmers,  and  then  tested  and  debugged. 


Data  independence  may  cause  an  expenditure  increase  because  of  the 
possibility  of  processing  wrong  information  due  to  integrity  exposures 
introduced  by  the  DBMS  system. 


Data  security  can  result  in  increased  cost  due  to  a longer  program 
development  time  than  would  otherwise  be  the  case. 

• However,  if  the  other  functions  are  essentially  equal,  then  the  intangible  costs 
will  also  tend  to  be  equal.  In  this  case,  the  measurable  cost/performance 
factors  will  constitute  valid  evaluation  criteria. 


2.  MEASURABLE  COST  FACTORS 


• The  obvious  measurable  cost  factors  include  the  price  of  the  DBMS  package 
and  whether  the  package  is  available  only  on  a purchase  basis  or  whether  it 
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can  be  leased.  Any  installation  and  maintenance  charges  should  be  considered 
in  the  cost  of  the  package. 

• An  additional  cost  factor  which  can  be  measured  is  the  amount  of  main 
storage  required  for  the  first  user  in  the  system  (which  of  course  may  require 
the  full  DBMS  to  be  present),  and  then  the  incremental  main  storage  cost  as 
each  additional  user  is  added  to  the  system. 

• Other  installation-dependent  costs,  such  as  the  amount  of  disk  storage 
required  (which  is  a factor  of  the  size  of  the  data  base),  the  CPU  utilization  (a 
factor  of  the  processing  load),  and  the  education  costs  (depending  on  the 
number  of  installation  personnel)  should  also  be  considered. 

3.  DBMS  COST  JUSTIFICATION 

• In  carrying  out  a cost  justification,  estimates  should  be  inade  of  the  intangible 
costs  together  with  the  measurable  costs.  These  costs  must  be  offset  against 
the  benefits  introduced  by  applications  which  will  use  the  DBMS  as  illustrated 
in  Exhibit  1 1 1-5. 

• A decision  to  use  a particular  Data  Base  Management  System  is  an  investment 
decision  which  may  impact  the  profitability  of  the  company  equally  as  much  as 
the  purchase  of  a large  piece  of  machinery. 

• As  with  any  investment  decision,  a decision  for  a particular  DBMS  should 
consider  not  only  the  initial  installation  costs  (which  includes  installation  of 
the  DBMS,  the  data  base  itself,  and  development  of  the  applications),  but  also 
the  running  cost  for  continued  operation  of  the  data  base/data  communication 
applications.  This  running  cost  should  include  a proportion  of  the  CPU,  disk 
storage  and  personnel  costs  associated  with  running  the  installation. 

• These  costs  should  be  evaluated  against  the  benefits  gained  from  moving  the 
particular  applications  into  a DBMS  environment.  Generally,  these  benefits 
will  only  be  realizable  when  the  DBMS  is  operational. 
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EXHIBIT  MI-5 


DBMS  COST/BENEFIT  CONSIDERATIONS 
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Thus,  in  any  DBMS  implementation  there  will  be  an  initial  cash  outflow  during 
development  of  the  applications  which  will  subsequently  be  offset  by  cash 
inflows  resulting  from  benefits  gained  when  the  applications  become  opera- 
tional. 

• 

The  cost  justification  analysis  then  becomes  the  matter  of  determining  the 
break-even  point  for  the  DBMS  applications  and  the  return  on  investment. 

• 

While  many  techniques  are  used  financially  to  determine  this  break-even  point, 
it  is  important  to  use  a technique,  such  as  discounted  cash  flow,  which 
considers  the  time  value  of  money,  particularly  in  times  of  high  inflation  such 
as  during  the  mid-1970s. 

1. 

PERFORMANCE  FACTORS 

© . 

In  an  environment  where  only  a single  user  at  a time  is  being  processed,  the 
performance  of  a system  is  generally  evaluated  in  terms  of  the  elapsed  time 
for  that  processing.  However,  in  a multi-user  environment  where  several  user 
transactions  may  be  processed  concurrently,  a valid  measure  of  performance  is 
the  number  of  transactions  per  unit  of  time,  commonly  known  as  throughput. 

• 

It  is  generally  quite  difficult  to  predict  the  performance  of  a DBMS  prior  to  its 
implementation.  However,  the  various  potential  constraints  on  the  perfor- 
mance of  such  a system  should  be  considered.  These  are: 

Architecture  of  the  DBMS. 

Number  of  Input/Output  accesses  required  per  transaction. 

Power  of  the  CPU  on  which  the  system  is  implemented. 

© 

There  are  a number  of  factors  that  can  affect  these  constraints. 
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DBMS  ARCHITECTURE 
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• The  architecture  of  the  DBMS  generally  has  the  most  significant  effect  on 
performance.  One  factor  which  has  the  most  effect  is  the  ability  to  process 
multiple  requests  simultaneously,  a multi-thread  capability. 

• If  only  one  transaction  can  be  processed  at  a time  (single  thread),  other 
transactions  are  kept  waiting  until  the  first  transaction  has  completed  its 
processing  in  the  DBMS. 

• To  ensure  that  a multi-thread  capability  can  be  provided,  the  DBMS  and 
application  program  should  be  written  using  re-entrant  coding  techniques. 


2.  INPUT/OUTPUT  POWER 


• Generally,  on  medium  to  large  CPUs,  the  first  constraint  encountered  is  the 
I/O  power  of  the  system.  On  these  systems,  factors  which  reduce  the  number 
of  I/Os  required  per  transaction  will  have  the  most  positive  effect  on  the 
throughput  of  the  system.  By  reducing  the  number  of  I/Os  per  transaction,  one 
theoretically  can  increase  the  throughput  of  the  system  up  to  a limit  which  is 
represented  by  the  instruction  processing  capacity  of  the  CPU. 


• Some  of  the  factors  which  can  affect  the  number  of  I/O  operations  per 
transaction  are: 


I/O  buffering  technique. 

Facilities  to  group  associated  records. 

Method  for  establishing  record  interrelationships. 

Capability  to  choose  an  access  method  which  will  match  the  required 
data  use. 

o 
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• The  support  of  a common  buffer  pool  by  a DBMS  can  significantly  reduce  the 
number  of  I/Os  necessary  in  processing  data. 

The  technique  used  is  to  maintain  (as  long  as  possible)  data  base  records 
in  main  storage,  before  that  main  storage  is  required  for  additional  data 
base  record  requests. 

The  philosophy  behind  this  approach  is  that  once  a set  of  data  base 
records  has  been  retrieved  in  main  storage  there  is  a high  probability 
that  another  record  in  the  set  may  be  required  again  in  a very  short 
period  of  time  for  additional  processing. 

• Thus,  many  DBMS  use  a "frequency  of  reference"  technique  such  that  data 
base  records  reside  in  the  buffer  pool  until  that  space  is  needed  for  new 
records. 

At  this  time,  the  storage  space  used  by  the  oldest  record  in  the  buffer 
pool  may  be  utilized  to  store  new  data  base  records. 

If  that  old  data  base  record  had  been  updated  in  the  buffer  pool  at  this 
time  it  would  be  written  back  to  the  data  base  before  the  new  record  is 
read  in. 

Otherwise  the  space  occupied  by  the  old  record  is  freed  and  the  new 
record  is  immediately  read  in. 

• Other  techniques  which  attempt  to  optimize  I/O  performance  result  in  the 
DBMS  selecting  the  oldest  data  base  record  which  is  closest  to  the  current 
position  of  the  appropriate  disk  storage  access  arm  to  minimize  disk  seek 
activity. 

• Another  factor  which  can  influence  performance  is  the  support  of  repeating 
groups.  In  this  instance,  repeating  groups  which  reside  in  a certain  section  of 
the  data  base  may  be  so  closely  adjacent  that  they  are  brought  into  main 
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storage  as  part  of  a data  base  record  needed  to  satisfy  a request  for  an  earlier 
repeating  group.  In  this  case,  when  the  later  repeating  group  types  are 
needed,  they  may  already  be  in  main  storage  bypassing  the  need  for  extra  I/O 
operations. 

• The  type  of  relationships  established  between  data  base  records  also  influences 
performance.  Where  data  base  records  are  related,  there  may  need  to  be 
additional  I/O  operations  to  retrieve  all  associated  related  records. 

Use  of  direct  relationships  enables  the  DBMS  to  traverse  between 
repeating  group  types  with  a minimum  number  of  I/Os. 

Direct  pointers  may  enable  the  DBMS  to  move  immediately  to  the 
appropriate  part  of  the  data  base  using  the  shortest  possible  I/O 
operation. 

On  the  other  hand,  indirect  relationships  such  as  the  use  of  symbolic 
pointers  - a data  base  key  for  example  - may  require  additional  1/0 
operations  to  locate  the  associated  data  base  records  by  accessing  an 
index. 

• One  of  the  most  significant  factors  in  the  performance  of  a DBMS  system  is 
the  choice  of  access  method. 

• A DBMS  which  supports  only  random  access  to  data  will  be  at  a significant 
performance  disadvantage  if  some  (or  much)  of  the  data  base  processing 
requires  sequential  retrieval  of  information  (as  may  be  the  case  with  batch 
processing).  Separate  direct  access  I/O  operations  may  be  required  to  retrieve 
each  record  in  sequence  based  upon  an  index,  or  alternatively  the  data  base 
may  first  have  to  be  sorted  into  the  desired  sequence  before  processing. 

• On  the  other  hand,  a DBMS  that  allows  the  data  base  to  be  organized  using  a 
sequential  access  method,  an  index  sequential  access  method,  a direct  access 
method,  and  possibly  also  an  indexed  direct  access  method,  permits  the  Data 
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Base  Administrator  to  choose  the  appropriate  access  method  which  best  meets 
the  varying  retrieval  requirements  of  applications  which  process  that  data 
base. 

3.  CPU  CAPABILITY 

• With  small  CPUs  in  particular,  the  CPU  power  may  become  the  limiting  factor 
before  the  I/O  capability. 

• The  first  and  obvious  selection  is  to  increase  CPU  power.  However,  this  may 
be  difficult  to  justify  and  leads  to  the  alternative  of  reducing  the  number  of 
instructions  which  the  DBMS  must  process  for  each  transaction  request.  In 
order  to  do  this  it  may  be  necessary  for  DBMS  which  are  to  be  used  on  small 
CPUs  to  provide  only  a small  subset  of  the  full  functional  capability  of  the 
DBMS. 

• This  subset  should  result  in  a reduced  instruction  path  length  and  consequently 
a lower  CPU  utilization.  However,  the  subset  DBMS  should  be  compatible 
with  the  full  system  and  capable  of  easy  growth  into  it. 

• Application  programs  written  for  the  subset  system  should  be  able  to  be 
migrated  with  little  or  no  change  and  process  the  data  base  supported  by  the 
full  DBMS. 

• In  summary,  factors  affecting  DBMS  performance  are  the: 

Ability  of  the  DBMS  to  support  multithread  processing,  permitting  the 
concurrent  processing  by  multiple  users  for  increased  transaction 
throughput. 

Facilities  to  reduce  the  number  of  I/Os  per  transaction  by  using  buffer 
pool  techniques,  record  grouping  and  direct  relationships. 
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Availability  of  a subset  of  the  full  DBMS  function  enables  a reduction  in 
the  number  of  instructions  processed  to  satisfy  transaction  requests. 


9 


J.  SUMMARY  OF  EVALUATION  CRITERIA 


The  evaluation  criteria  identified  in  this  section  are  summarized  in  the  following 
paragraphs  and  subsequently  used  to  evaluate  a number  of  DBMS  in  Chapter  IV: 

• Basic  Functional  Capabilities:  The  most  important  criteria  are  easy  acces- 

sibility for  application  programmers  and  end  users,  the  ability  to  support 
multiple  views  of  data,  and  the  ability  to  provide  data  consolidation. 


• Data  Independence:  The  level  of  data  independence  supported  by  a DBMS  is 

dependent  upon  the  number  of  levels  of  mapping.  Up  to  three  levels  of 
mapping  may  be  used:  an  Internal  map,  an  External  map  and  an  intermediate 
Conceptual  map  which  permits  transformation  of  a physical  data  base  view 
into  a logical  data  base  view. 


An  additional  level  of  data  independence  is  provided  by  field  level 
independence. 


To  assess  the  data  independence  capability  of  a DBMS,  the  impact  of 
changes  on  reloading  the  data  base,  recompiling  programs  or  changing 
program  logic  should  be  measured. 


• Data  Integrity:  A DBMS  should  provide  exclusive  control  to  ensure  that  data 
is  not  lost  through  simultaneous  updates,  or  through  program  failure. 
Recovery/restart  facilities  in  ensuring  data  integrity  and  the  responsibilities 
for  providing  the  recovery/restart  facilities  are  very  important. 


Data  Security:  In  the  data  security  area,  the  importance  of  various  restriction 
mechanisms  and  the  enforceability  of  these  mechanisms  are  critical. 
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• Ease  of  Use:  Tools  should  be  provided  for  the  Data  Base  Administrator,  the 
application  programmer  and  the  end  user  to  permit  them  to  use  the  DBMS 
based  upon  their  level  of  requirement  and  skill. 

• Cost/performance:  Cost  and  performance  factors  include  both  tangible  and 

intangible  costs  and  determination  of  various  constraints  to  performance. 

• A Data  Base  Management  System/Data  Communication  system  must  contain  a 
proper  balance  of  all  of  these  ingredients  in  order  to  provide  an  adequate  base 
on  which  to  build  an  effective,  reliable  and  secure  information  system. 
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IV  ANALYSIS  OF  CURRENT  PRODUCTS 


• In  this  section  eight  popular  Data  Base  Software  products  in  use  today  are 
described: 


PRODUCT 

VENDOR 

A. 

ADABAS 

SOFTWARE  AG 

B. 

DMS  1 1 

BURROUGHS 

C. 

DMS  170 

CONTROL  DATA 

D. 

DMS  1 100 

UNIVAC 

E. 

IDMS 

CULLINANE 

F. 

IMS  (DL/I) 

IBM 

G. 

SYSTEM  2000 

MRI  SYSTEMS 

H. 

TOTAL 

CINCOM 

• Each  data  base  product  overview  describes  the  vendor,  the  computing  systems 
supported,  operating  systems  supported,  and  type  of  structure  (hierarchies, 
networks,  inverted  structures  and  relational  data  base). 
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The  general  format  used  to  assess  each  product  is  consistent  with  the 
Evaluation  Criteria  provided  in  Section  III,  namely: 


1.  General  Description 

2.  Basic  Functional  Capabilities 

3.  Data  Independence 

4.  Data  Integrity 

5.  Recovery/Restart 

6.  Data  Security 

7.  Ease  of  Use 

8.  Cost/Performance 

For  two  of  the  products,  namely  IMS  and  TOTAL,  two  additional  format  points 
associated  with  Distributed  Processing  and  Distributed  Data  Base  are  included. 


A sample  Personnel  data  base  is  frequently  used  for  discussion,  showing  the 
way  in  which  that  data  would  be  designed  using  the  particular  product.  The 
Personnel  data  base  includes  name,  address  and  payroll  information  for  each 
employee,  together  with  skills,  education  and  experience  held  by  each 
employee. 
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IV  A.  ADABAS  (Software  AG) 


J 


A.  ADABAS  (SOFTWARE  AG) 


I.  GENERAL  DESCRIPTION 

• ADABAS  (Adaptable  Data  Base  System)  was  developed  in  Germany  by  Soft- 
ware AG  and  supports  IBM  System/360,  System/370,  and  Siemens  4004.  The 

operating  systems  supported  include  DOS,  DOS/VS,  OS,  (MFT,  MVT)  OS/VS  I 
and  OS/VS2  (SVS,  MVS). 

• ADABAS  has  been  marketed  since  1971  and  has  been  installed  in  over  200 
accounts  worldwide. 

• ADABAS  uses  a sophisticated  "inverted  file"  approach  to  data  base  manage- 
ment. It  has  been  described  as  similar  to  a relational  data  base  approach  and 

permits  complex  queries  to  be  made  against  simple  data  bases  very  efficiently. 

• While  ADABAS  does  not  provide  a Data  Communication  facility  itself, 
interfaces  are  maintained  to  enable  CICS,  INTERCOM,  COM-PLETE,  and 
TASK/MASTER  to  be  used  with  ADABAS. 

• An  ADABAS  data  base  is  physically  and  logically  organized  into  two  distinct 
sections.  These  are  the  Data  Storage  section  and  the  Associator  Section. 

a.  Data  Storage  Section 

• The  Data  Storage  contains  the  actual  data  which  resides  as  variable  length 
records  within  fixed  length  blocks  (approximately  3,000  bytes  depending  on  the 
particular  DASD  used). 

When  data  records  are  stored  in  these  blocks  by  ADABAS,  they  are 
compressed  by  eliminating  right-most  blanks  of  alphameric  fields  and 
leading  zeros  of  numeric  fields. 
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Unpacked  fields  are  held  internally  as  packed.  The  resulting  variable 
length  fields  then  have  appended  a length  indicator. 

• Fields  in  the  record  which  are  absent  are  indicated  as  "empty"  by  a one  byte 
indicator  inserted  at  the  appropriate  point  in  the  record. 

Similarly,  one  byte  can  indicate  up  to  255  consecutive  empty  fields.  (If 
the  empty  fields  are  at  the  end  of  the  record,  they  are  not  carried  at 
all.) 

• While  data  records  are  being  compressed  by  ADABAS,  they  may  be  enciphered 
for  security  reasons  so  that  the  data  appears  "scrambled"  on  disk.  The  result 
of  this  compression  is  quoted  by  ADABAS  as  requiring  only  50  to  80%  of  the 
disk  space  occupied  by  the  raw  input  data. 

• ADABAS  compresses  each  record  in  the  above  manner  and  inserts  as  many 
variable  length  records  as  possible  in  the  fixed  length  block,  while  leaving  an 
amount  of  free  space  to  permit  later  record  expansion. 


• Each  data  record  per  file  is  allocated  an  Internal  Sequence  Number  (ISN)  which 
is  a three  byte  binary  number  permitting  a file  to  contain  up  to  a maximum  of 
16,777,216  records.  A data  base  may  hold  up  to  255  files.  This  ISN  is  used  for 
subsequent  reference  to  the  data  record  by  ADABAS  in  resolving  data 
requests. 

b.  Associator  Section 


• The  Associator  is  an  index  which  is  constructed  by  ADABAS  as  the  data  base  is 
loaded.  One  index  is  constructed  for  each  file  which  has  Descriptor  fields  (key 
fields)  identified  by  the  Data  Base  Administrator. 
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Each  index  contains  an  entry  for  each  unique  descriptor  value  in  the 
data  base,  and  also  contains  the  Internal  Sequence  Numbers  (ISN)  of 
each  data  record  which  has  that  key  value. 

The  result  is  an  Inverted  List  Structure  of  the  data  base,  enabling  all 
data  records  which  contain  a particular  descriptor  value  to  be  quickly 
identified  through  the  Associator  file. 

Any  field  in  the  data  record  may  be  defined  by  the  Data  3ase 
Administrator  as  a Descriptor  field.  Utilities  provided  by  ADABAS  load 
the  various  unique  field  values  for  that  descriptor  field  into  an  index  in 
the  Associator,  relating  each  unique  descriptor  value  with  the  ISN  of  all 
records  which  have  that  value. 

ADABAS  supports  up  to  500  fields  per  record,  of  which  up  to  200  fields  may  be 
defined  as  descriptor  fields  with  up  to  12  phonetic  descriptors  and  a maximum 
of  255  files  per  data  base. 

To  enable  ADABAS  to  relate  the  Internal  Sequence  Number  (ISN)  of  each  data 
record  in  the  data  base  to  the  actual  physical  location  of  that  record,  an 
Address  Converter  is  used. 

To  obtain  the  physical  block  locations  of  a particular  data  record,  ADABAS 
uses  the  ISN  as  an  index  into  the  Address  Converter,  from  which  it  obtains  the 
physical  block  location  of  the  record  for  access. 

c.  Data  Interrelationships 


• Data  in  separate  files  is  interrelated  by  ADABAS  by  "coupling"  the  two  files 
together  on  a common  descriptor. 

For  example,  a Payroll  data  base  and  a Skills  data  base  could  be  related 
(coupled)  using  either  the  person's  employee  number  or  name. 
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Up  to  80  files  can  be  coupled  in  this  way,  but  only  a maximum  of  five 
coupled  files  may  be  referenced  in  one  application  program  retrieval 
request. 

• ADABAS  is  able  to  resolve  a query  through  one  call  to  ADABAS,  with  search 
criteria  involving  up  to  five  coupled  files  and  a maximum  of  125  descriptors. 

• In  accessing  the  Associator  for  a particular  Descriptor  value,  ADABAS  quotes 
a maximum  of  three  physical  I/O  accesses  with  an  average  of  2.8. 

After  identifying  the  ISN  of  each  data  record  having  a particular 
Descriptor  value,  ADABAS  retrieves  the  data  record  by  first  accessing 
the  Address  Converter  to  identify  the  physical  location  of  that  record, 
and  then  accessing  the  record  itself. 

ADABAS  quotes  an  average  of  2.8  physical  I/O  accesses  to  retrieve  the 
actual  data  record  through  the  Address  Converter. 

Each  additional  record  to  be  retrieved  will  also  require  an  average  of 
2.8  physical  I/O  accesses. 

• Apart  from  the  index  approach,  ADABAS  provides  access  by  allowing  the  user 
to  allocate  his  own  ISN  (unique  key).  An  access  on  this  particular  key  will  only 
require  the  Address  Converter. 

• ADABAS  also  provides  ADAM,  a random  access  approach  that  permits  a data 
record  to  be  retrieved  by  calculating  its  physical  location  based  upon  a 
randomizing  algorithm  applied  to  the  descriptor  value. 

• Selection  of  data  records  based  upon  a number  of  descriptor  fields  requires  a 
corresponding  additional  number  of  physical  I/O  accesses. 

• As  no  pointers  are  contained  within  the  data  records  themselves,  with  all  data 
relationships  established  through  the  Associator,  the  addition  of  new  data 
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relationships  can  be  established  very  readily  by  means  of  a utility  program. 
ADABAS  supplies  a utility  that  enables  the  Data  Base  Administrator  to  add 
new  descriptors  to  the  Associator,  or  couple  or  uncouple  files,  without  having 
to  reload  the  data  base.  This  flexibility  is  one  of  ADABAS'  strong  points. 

Thus,  if  there  is  a requirement  to  retrieve  records  based  upon  a large  number 
of  different  field  values,  ADABAS  requires  that  each  of  these  fields  be  defined 
as  descriptors  in  the  Associator.  This  has  implications  in  terms  of  update, 
additions  or  deletions  to  data  records  in  the  data  base. 

ADABAS  must  update  each  descriptor  field  in  the  Associator  to  reflect 
a changed  field  value,  or  the  addition  or  deletion  of  a data  record. 
Thus,  in  a highly  volatile,  complex  data  base,  considerable  I/O  activity 
may  result. 

To  minimize  this  update  activity,  ADABAS  provides  an  off-line  utility 
to  add  records  to  the  file  faster  than  it  can  be  done  on-line. 

However,  in  an  on-line  inquiry/update  environment,  with  complex 
inquiries  involving  a large  number  of  descriptors,  performance  may 
suffer. 

d.  Data  Definition  Language 

Because  of  the  structure  of  ADABAS,  whereby  no  record  interrelationships  or 
pointers  exist  within  the  data  records  themselves,  the  Data  Definition 
Language  generally  only  requires  specification  of  the  identification  of  each 
field  in  the  data  record  by  name,  type  (fixed  length,  variable  length,  periodic 
occurrence,  etc.). 

Security  levels  (up  to  a maximum  of  15)  can  be  specified  on  a field  and/or  file 
basis  to  control  access  to  information.  Thus,  a program  with  a security  level 
of  12  has  authority  to  access  any  fields  with  the  security  level  also  of  12  or 
lower. 
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• ADABAS  enables  application  programs  to  request  fields  by  name.  This  results 
in  a high  degree  of  data  independence  with  the  ability  to  add  new  fields  to  the 
record  without  impacting  existing  applications  which  do  not  reference  the  new 
fields,  and  without  reloading  the  data  base. 

• Data  relationships  are  established  by  a means  of  a ADABAS  utility  which  loads 
each  separate  identified  descriptor  field  into  the  Associator. 

Descriptor  field  values  are  extracted  from  the  file,  together  with  the 
ISN  of  each  data  record  that  contains  a value  for  that  descriptor  field 
(that  is  not  "empty"). 


These  descriptor  field  values  are  then  sorted  into  sequence,  and  the  ISN 
for  all  data  records  that  contain  a particular  descriptor  field  value  is 
stored  in  the  Associator  as  an  entry  for  that  Descriptor  index. 


This  utility  is  used  for  definition  of  each  descriptor  field  in  the  data 
records. 


• Files  are  "coupled"  by  another  ADABAS  supplied  utility  using  a defined 
descriptor  field  which  is  common  to  both  files.  With  a coupling  relationship 
defined,  based  upon  this  common  descriptor,  search  criteria  which  apply  to 
both  files  can  be  readily  handled  by  ADABAS. 


• With  the  requirement  to  include  each  unique  descriptor  field  value  in  the 
associator,  together  with  the  ISN  of  all  records  that  contain  that  value,  there 
is  some  redundancy  of  data  storage. 
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This  redundancy  together  with  the  ISN  of  each  data  record  that 
contains  a particular  descriptor  field  value,  and  together  with  the  disk 
storage  required  for  the  address  convertor,  represents  an  overhead  on 
top  of  the  disk  storage  required  for  the  actual  data  records.  The 
compression  which  ADABAS  applies  to  data  records  results  in  the 
ADABAS  file  occupying  between  50  and  80%  of  the  original  raw  input 
data. 

With  the  overhead  required  by  ADABAS  to  establish  data  relationships, 
typically  the  total  disk  storage  required  is  still  less  than  that  occupied 
by  the  raw  input  data. 

Instances  where  the  total  disk  storage  is  greater  than  the  raw  input 
data  would  occur  when  rather  long  fields  were  used  as  descriptors,  and 
many  descriptors  were  defined  in  the  data  record. 

e.  Data  Manipulation  Languages 

ADABAS  uses  a standard  CALL  interface  which  enables  it  to  be  used  with 
ASSEMBLER,  COBOL,  FORTRAN,  and  PL/I. 

Seven  commands  are  used  to  carry  out  all  data  base  operations.  These  are: 
FIND:  Search  Associator. 

READ:  Read  data  record. 

UPDATE:  Modify  field  values,  delete  field  values,  add  field  values. 
ADD:  Add  logical  records. 

DELETE:  Delete  logical  records. 

OPEN/CLOSE:  Logical  open  and  close  of  record. 
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CHECKPOINT:  Indicate  logical  restart  point. 


• The  FIND  command  allows  the  user  to  specify  a logical  search  to  identify 
records  in  the  data  base  meeting  particular  selection  criteria. 

Several  complex  partial  search  criteria,  each  of  which  is  comprised  of 
one  or  more  descriptor  fields,  can  be  specified.  Logical  relationships 
between  descriptor  fields  can  be  indicated  by  the  logical  operators, 
AND,  OR,  FROM  - TO,  BUT  - NOT. 

To  identify  these  records,  ADABAS  does  not  access  the  data  base  itself, 
but  instead  uses  only  the  Associator.  Consequently,  the  number  of 
physical  accesses  required  to  perform  the  query  is  substantially  reduced 
and  to  a large  degree  independent  of  the  actual  volume  of  data 
contained  in  the  data  base. 

Another  variation  of  the  FIND  command  is  FIND  COUPLED.  This 
allows  queries  to  be  specified  which  involve  descriptors  which  are 
common  to  coupled  files. 

• There  are  five  different  READ  commands  available  to  the  ADABAS  user: 
READ  RANDOM  (by  ISN),  READ  LOGICAL  SEQUENTIAL,  READ  PHYSICAL 
SEQUENTIAL,  READ  DESCRIPTOR,  and  READ  FIELD  DESCRIPTION  TABLE. 

The  first  three  READ  commands  result  in  the  reading  of  a record  or 
records  from  the  data  base,  while  the  last  two  commands  require  access 
to  the  Associator  only. 

The  READ  RANDOM  command  results  in  the  reading  of  the  single 
record  from  the  data  base,  based  on  a particular  ISN  value,  and  requires 
reference  to  the  Address  Converter  first. 

The  READ  LOGICAL  SEQUENTIAL  reads  a file  in  the  sequence  of  the 
values  of  a descriptor  field,  while  READ  PHYSICAL  SEQUENTIAL 
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reads  an  entire  file  in  physically  sequential  order.  The  records  are  read 
in  the  exact  order  in  which  they  are  currently  physically  located  in  the 
data  base. 

The  READ  DESCRIPTOR  VALUE  enables  the  program  to  quickly 
determine  all  of  the  current  values  of  any  descriptor  field  and  refer- 
ences the  Associator  only.  Using  ADASCRIPT,  a Histogram  of  any 
descriptor  may  be  displayed. 

The  READ  FIELD  DESCRIPTION  TABLE  command  is  used  to  read  the 
field  description  entries  from  the  Associator  for  any  file  in  the  data 
base.  This  returns  to  the  application  program  the  field  name,  standard 
length  and  format,  and  special  field  characteristics  for  each  field. 

The  UPDATE  command  enables  the  user  to  modify  the  values  of  existing  fields 
in  a record.  The  field  length  may  be  increased,  reduced  or  left  the  same 
length  as  a result  of  the  modification  without  having  to  redesign  or  recreate 
the  existing  data. 

Only  those  fields  to  be  updated  need  be  specified.  All  other  fields  in 
the  record  remain  unchanged. 

ADABAS  then  updates  all  inverted  lists  in  the  Associator  to  reflect  the 
change  in  field  content,  if  that  field  was  defined  as  a descriptor  field. 

The  ADD  RECORD  command  enables  a new  record  to  be  added  to  an  existing 
file.  All  fields  for  that  record  to  be  added  to  the  file  need  not  be  present  at 
the  time  the  record  is  first  created. 

Fields  initially  empty  may  later  be  added  to  the  record  by  means  of  the 
UPDATE  commands.  As  the  new  record  is  added,  ADABAS  also  updates 
the  Associator  to  reflect  the  existence  of  that  record  based  on  the 
content  of  any  descriptor  fields. 
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If  a large  number  of  records  are  to  be  added  to  an  existing  file,  an 
ADABAS  utility  can  be  used  to  add  the  records  as  an  alternative  to  the 
ADD  RECORD  command. 

• The  DELETE  command  enables  an  application  program  to  delete  a logical 
record  from  the  file.  During  the  deletion  of  the  record,  all  related  descriptor 
entries  in  the  Associator  are  also  deleted.  The  disk  storage  freed  up  in  the 
data  base  and  the  Associator  are  made  available  for  re-use  as  required,  for  any 
new  information  to  be  later  added  to  the  file. 

• The  OPEN  and  CLOSE  commands  are  used  logically  to  start  and  end 
application  processing  against  the  data  base.  The  OPEN  command  allows  the 
program  to  specify  security  information  to  permit  later  access  to  authorized 
fields  during  application  processing. 

• The  CHECKPOINT  command  allows  the  application  program  to  specify  logical 
checkpoints  during  the  processing  of  data  base  update  programs.  A maximum 
of  199  checkpoints  may  be  taken  by  any  one  program  and  enable  ADABAS 
utilities  to  restart  processing  in  the  event  of  a system  or  program  failure  from 
the  most  recent  checkpoint  taken. 

A synchronized  checkpoint  facility  is  provided  by  ADABAS  for  applica- 
tion programs  operating  concurrently  in  batch  or  on-line  mode,  or  both. 

• ADABAS  provides  several  commands  which  can  be  used  for  recovery/restart 
purposes.  These  are: 

C5  - Write  to  Log:  This  command  enables  user  data  to  be  written  to 
the  ADABAS  Log  tape  (such  as  an  on-line  transaction). 

ET  - End  Transaction:  The  ET  command  is  for  use  by  all  on-line  update 
programs.  It  is  issued  by  an  application  program  to  indicate  the  logical 
completion  of  a user  transaction  which  consists  of  one  or  more 
ADABAS  update  commands. 
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BT  - Backout  Transaction:  The  BT  command  is  issued  by  the  user  to 
remove  the  effect  of  a transaction  which  has  abnormally  terminated. 

BASIC  FUNCTIONAL  CAPABILITIES 

a.  Easy  Accessibility 

ADABAS  uses  a CALL  interface  which  enables  it  to  be  invoked  from  a 
machine-oriented  language  such  as  ASSEMBLER,  commercial  languages  such 
as  PL/I  and  COBOL,  and  scientific  languages  such  as  PL/ 1 and  FORTRAN. 

End  user  languages  are  provided  for  the  commercial  user  by  ADASCRIPT  and 
ADAWRITER. 

ADASCRIPT  is  a procedural  language  for  ADABAS  which  can  be  used  in 
a query  and  update  environment.  It  enables  queries  to  be  made  in  an 
English-like  language  which  is  translated  into  the  necessary  ADABAS 
calls. 

ADAWRITER  is  a report  writer  for  use  in  the  batch  environment  which 
enables  the  user  to  select,  format  and  print  information  from  the  data 
base. 

ADABAS  does  not  provide  a Data  Communication  facility  itself,  but  instead 
Software  AG  and  its  users  have  developed  interfaces  to  CICS,  COM-PLETE, 
TSO,  IMS  DC,  INTERCOMM  and  TASK/MASTER. 

This  raises  the  possibility  of  security  and  integrity  exposures,  depending 
upon  the  extent  to  which  ADABAS  relies  on  the  facilities  provided  by 
the  Data  Communication  package. 

Another  concern  is  the  ability  of  the  Data  Communication  interface  to 
resolve  deadlock  situations  between  on-line  users  and  also  between  on- 
line and  batch  processing. 
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b. 


Multiple  Views  of  Data 


9 


ADABAS  enables  data  to  be  accessed  sequentially  (logical  sequential  or 
physical  sequential).  Random  access  to  data  is  also  supported  by  the  ADAM 
access  method,  such  that  it  is  possible  to  access  a record  directly  through  a 
randomizing  algorithm  applied  by  ADABAS  to  a key. 

The  strength  of  ADABAS  lies  in  its  ability  to  support  indexed  access  to  data, 
particularly  through  multiple  indices. 

ADABAS  uses  an  Inverted  List  structure  such  that  indexes  can  be 
defined  based  on  up  to  200  fields  in  a record  (maximum  of  500  fields  per 
record). 


These  index  fields  are  referred  to  as  Descriptor  values,  and  reside  in  an 
Associator  file. 

• Each  descriptor  value  in  the  Associator  contains  the  indentification  of  every 
record  in  the  data  base  that  contains  that  particular  value  of  the  descriptor. 
This  location  is  specified  by  an  Internal  Sequence  Number  (ISN),  which  is 
converted  through  a separate  Address  Converter  to  a physical  block  location  in 
the  data  base. 


A characteristic  of  the  ADABAS  Inverted  List  structure,  however,  is  that  no 
data  from  other  fields  in  the  record  may  be  stored  with  a descriptor  field  in 
the  Associator.  Thus,  while  the  Associator  can  be  used  to  identify  all  records 
containing  particular  descriptor  values,  those  records  must  be  each  accessed 
to  obtain  additional  fields  from  the  records. 


• An  alternative  data  base  system  approach,  whereby  significant  fields  may  be 
extracted  from  the  record  and  stored  within  the  index  entry  associated  with  a 
particular  key  value  (as  used  by  IBM's  DL/1),  is  not  supported  by  ADABAS. 

9 
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• Additionally,  there  appears  to  be  no  way  to  sequence  repeating  group 
occurrences  within  the  record  without  detailed  programmer  involvement. 

c.  Data  Consolidation 

• ADABAS  provides  very  good  support  for  variable  length  records. 

There  is  a limit  of  a maximum  of  500  repeating  group  types  (fields)  per 
data  record  and  additionally  a maximum  of  99  occurrences  per  repeat- 
ing group  (field)  type. 

This  latter  constraint  may  be  a significant  restriction  in  an  environment 
that  requires  an  unlimited  number  of  occurrences  per  repeating  group 
(field)  type. 

• ADABAS  supports  a "flat  file"  record  structure,  with  the  ability  to  specify 
within  the  data  record  a repeating  group  (periodic  field).  However,  a 
repeating  field  cannot  itself  contain  another  repeating  field.  (It  can  instead 
contain  a multiple-valued  field  which  can  give  much  the  same  result). 

• An  alternative  approach  may  be  to  define  a separate  data  file  with  another 
data  record  containing  those  repeating  fields,  but  this  introduces  further 
complexity  and  the  possibility  of  data  redundancy,  as  well  as  additional 
physical  I/O  accesses  necessary  to  retrieve  that  related  data. 

• Data  consolidation  is  achieved  by  ADABAS  by  "coupling"  files  together.  The 
"coupling"  is  achieved  using  an  ADABAS  utility,  and  relates  two  files  by 
common  description. 

This  data  consolidation  is  achieved  without  requiring  pointers  within 
data  records  indicating  the  established  relationship. 

Instead,  the  relationship  is  indicated  in  the  Associator  based  on  the 
common  descriptor  entry. 
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It  is  the  responsibility  of  the  application  programmer  to  ensure  data  con- 
sistency between  coupled  files. 

A maximum  of  80  files  may  be  coupled  together  in  a data  base,  but  only  five 
coupled  files  may  be  referenced  in  a particular  application  program  search 
request. 

ADABAS  supports  a maximum  record  length  of  approximately  3,000  bytes, 
based  upon  the  storage  characteristics  of  the  particular  physical  disk  drive 
being  used. 

These  constraints  are  more  than  adequate  for  simple  data  bases,  but  may 
inhibit  the  use  of  ADABAS  for  more  complex  data  bases,  particularly  those 
with  records  that  may  contain  more  than  99  occurrences  per  repeating  group 
type  (field). 

DATA  INDEPENDENCE 
a.  Levels  of  Mapping 

ADABAS  provides  an  Internal  level  of  mapping  which  reflects  the  physical 
data  organization.  In  addition,  the  ability  to  request  fields  by  name  provides  a 
further  level  of  field  independence. 

ADABAS  supports  a Conceptual  and  External  mapping  level  through  the  use  of 
ADAMINT  (a  high-level  macro  interface  to  ADABAS). 

This  removes  the  need  for  the  application  programmer  to  be  aware  of 
which  files  are  coupled. 

Requests  against  coupled  files  using  ADAMINT  do  not  have  to  access 
explicitly  those  files  which  are  so  coupled. 
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Additionally,  ADAMIT  eliminates  the  need  for  the  application  program- 
mer to  specify  the  length  and  format  in  which  the  data  is  to  be 
delivered  to  his  program.  ADAMINT  (and  ADABAS)  carries  out  the 
necessary  format  and  length  conversions  as  requested  by  the  Data  Base 
Administrator,  both  on  reading  data  from  the  data  base  as  well  as 
returning  data  to  the  data  base. 

• Thus,  ADAMINT  enables  the  Data  Base  Administrator  to  define  file  coupling 
and  a subset  of  the  data  base  for  specific  programs.  This,  together  with  the 
field  level  data  independence  of  ADABAS,  enables  fields  to  be  added  or 
deleted  from  the  data  base  without  affecting  programs  that  do  not  reference 
those  fields  (or  files). 

b.  Data  Base  Changes 

• Change  Device  Type:  Changing  a device  type  requires  no  program  logic 

change  or  recompilation  but  will,  of  course,  require  the  data  base  to  be 
transferred  to  the  new  device. 

• Change  Access  Method:  ADABAS  uses  BDAM  for  the  support  of  inverted  lists, 
and  ADAM  for  direct  access.  A change  may  be  made  from  one  access  method 
to  the  other. 

• Change  Entity  View:  The  ability  to  change  the  view  of  the  data  base  is  carried 
out  by  means  of  an  ADABAS  utility  which  scans  the  data  base  and  establishes 
new  descriptor  fields  in  the  Associator,  or  couples  files  together  based  upon  a 
common  description  field.  No  change  is  necessary  in  program  logic,  recompi- 
lation or  data  base  loading. 

• Add  New  Entity:  The  addition  of  a new  record  can  be  achieved  without  any 

change  in  program  logic,  recompilation  or  data  base  reloading. 
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Add  New  Repeating  Group  Type:  Similarly  the  addition  of  a new  repeating 
group  type  (field)  can  be  achieved  with  no  changes  in  logic,  recompilation  or 
data  base  load. 

Add  New  Relationship:  The  addition  of  a new  relationship  only  involves  an 

associator  scan  by  an  ADABAS  utility  to  couple  two  files  based  upon  a 
common  descriptor  field. 

Add  New  Field  to  Repeating  Group:  The  addition  of  a new  repeating  group 
field  requires  no  change  in  logic,  recompilation  or  data  base  reloading. 

Change  Field  Format:  A change  in  field  format  requires  no  program  logic  or 
recompilation,  but  may  require  a data  base  reload  depending  upon  the  extent 
of  that  change. 

DATA  INTEGRITY 

a.  Exclusive  Control 


ADABAS  supports  an  Exclusive  Control  lockout  mechanism  that  operates  at 
the  data  record  level.  This  applies  both  with  a partition  and  across  partitions. 

The  lowest  isolated  level  is  the  data  base,  as  no  program  isolation  mechanism 
is  provided  by  ADABAS.  This  means  that  ADABAS  must  lockout  two 
concurrent  requests  to  update  the  data  base  and  single  thread  those  update 
requests  through  the  data  base. 

A deadlock  is  possible  but  is  detected  in  ADABAS  V4.I  and  backout  auto- 
matically initiated. 
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5. 


RECOVERY/RESTART 


a.  Recovery 

• ADABAS  supports  the  logging  of  after  images  for  recovery  of  the  data  base  in 
the  event  of  an  I/O  error.  Only  one  log  tape  is  supported  - the  ability  to 
create  dual  log  tapes  for  additional  data  integrity  of  the  log  itself  is  not 
provided. 

• A Copy/Restore  utility  is  available  to  produce  backup  copies  of  the  data  base. 
No  provision  is  made  for  a summarization  of  the  log  to  condense  system 
activity  (to  only  the  most  recent  changes  to  data  records  or  fields)  for 
optimum  recovery  performance. 

• A utility  is  provided  to  recover  the  data  base.  This  recovery  utility  is  able  to 
recover  down  to  the  physical  block  level  of  the  data  base.  It  is  not  necessary 
to  recover  the  entire  data  base. 

b.  Batch  Restart 

• ADABAS  supports  the  logging  of  before  images  to  enable  a batch  program  to 
be  recovered  by  backing  out  all  uncompleted  data  base  activity  to  a previous 
checkpoint. 

• A utility  is  provided  to  complete  the  log  tape  in  the  event  of  a system  failure 
and  then  use  that  log  tape  to  backout  activity  resulting  from  a partially 
completed  program  to  a previous  checkpoint. 

c.  On-Line  Restart 

• As  ADABAS  does  not  provide  a Data  Communication  facility  itself,  it  is 
dependent  upon  those  facilities  provided  by  CICS,  INTERCOMM  or 
TASK/MASTER.  However,  it  does  make  provision  for  logical  transaction 
recovery  and  backout. 
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• The  logical  completion  of  an  ADABAS  transaction  is  indicated  by  the 
application  program  issuing  an  ET-End  Transaction  command.  ADABAS 
automatically  backs  out  all  transactions  following  an  ADABAS  or  System 
failure  for  which  an  ET  command  was  not  logged. 

• The  use  of  different  physical  log  devices  raises  a potential  data  integrity 
exposure  due  to  a "window"  during  which  logging  may  have  taken  place,  for 
example,  on  the  CICS  log  but  a system  failure  occurred  before  logging  could 
also  take  place  for  the  same  activity  on  the  ADABAS  LOG,  or  vice  versa. 

• CICS  uses  a synchronization  and  key-pointing  approach  on  an  individual  task 
basis  rather  than  requiring  system-wide  quiescing  for  a system  checkpoint. 
Thus,  each  concurrently  executing  CICS  task  may  independently  take  a 
synchronization  point  which  does  not  effect  any  other  tasks  in  the  system  at 
that  time. 

6.  DATA  SECURITY 

• ADABAS  uses  a security  mechanism  based  upon  a security  level  with  values 
from  I to  1 5. 

This  security  level  can  be  applied  to  each  file  or  field  within  the  file 
and  is  compared  with  the  security  level  of  a user  (batch  or  on-line 
program)  requesting  access  to  the  file  or  field. 

Only  those  users  with  the  security  level  greater  than  or  equal  to  the  file 
or  field  security  level  are  permitted  access. 

• A different  security  level  may  be  specified  for  update  as  specified  for  read. 
However,  no  distinction  is  made  between  update,  add  and  delete  security. 
They  are  all  grouped  together  as  an  "update"  security. 

• The  field  level  security  offered  by  ADABAS  provides  a great  deal  of  control 
over  the  access  to  individual  fields.  However,  once  that  security  is  broken  (by 
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a user  specifying  a security  level  high  enough),  the  user  can  access  all  fields 
with  that  security  level  or  lower.  Thus,  specifying  security  level  15  allows 
access  to  all  fields  within  the  data  record. 

No  provision  is  made  by  ADABAS  to  ensure  that  the  application  programmer  is 
only  aware  of  those  fields  or  elements  within  the  data  record  that  he  is 
permitted  to  have  access  to.  That  is,  the  Data  Base  Administrator  is  unable  to 
specify  a subset  of  the  fields  to  be  accessible  to  a program  in  any  other  way 
than  through  the  allocation  of  a sufficiently  high  security  level. 

EASE  OF  USE 

Data  Base  Administrator 


The  Data  Definition  Language  is  a non-procedural  language,  and  allows  the 
DBA  to  specify  group  fields  that  contain  within  them  individual  fields  by 
means  of  a hierarchical  structure  similar  to  that  used  with  higher  level 
languages.  The  DBA  specifies  the  field  name,  length,  type  and  format. 

No  design  aids  are  apparently  provided  by  ADABAS,  but  a data  base  status 
reporting  utility  is  provided  as  a measurement  aid. 

No  Data  Dictionary  utilities  or  programs  are  provided  for  documentation  and 
control,  but  a special  form  of  the  READ  command  is  provided  which  enables 
the  descriptor  tables  in  the  Associator  to  be  accessed  so  that  a user  can 
develop  his  own  data  dictionary  support. 

Utilities  are  provided  for  data  base  restructuring  together  with  a utility  which 
converts  an  ISAM  file  to  an  ADABAS  data  base. 

Externals  education  is  provided  along  with  documentation  as  user  guides. 
However,  little  information  is  provided  in  the  way  of  Internals  of  ADABAS. 
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b. 


Application  Programmer 


The  Data  Manipulation  Language  uses  a CALL  interface  in  the  appropriate 
programming  language  (ASSEMBLER,  COBOL,  PL/ 1 and  FORTRAN). 

The  application  programming  interface  is  kept  simple  with  only  seven  com- 
mands supported  and  with  additional  variations  in  the  READ  command.  Each 
command  can  reference  a maximum  of  four  record  types  in  four  coupled  files. 

Extensive  data  research  facilities  are  provided  which  are  applied  against 
descriptor  fields  in  the  Associator.  These  enable  the  application  programmer 
to  specify  high,  low  and  equal  comparisons,  together  with  the  use  of  boolean 
operators  and  multiple  record  types. 

c.  End  User 


Two  user  languages  are  supported:  ADASCRIPT  + and  ADAWRITER. 

ADASCRIPT  + is  a query  language  that  permits  easy  specification  of  retrieval 
request  for  batch  processing  primarily.  However,  it  can  also  be  invoked  in  an 
on-line  environment  for  batch  processing.  This  provides  a retrieve,  update, 
add  and  delete  capability. 

ADAWRITER  is  a formatting  and  report  preparation  facility  that  can  be  used 
for  extraction  and  production  of  formal  reports. 

COST/PERFORMANCE 

Measurable  Costs 


ADABAS  can  be  purchased  for  $120,000  for  the  OS  version  or  $80,000  for  the 
DOS  version.  It  is  available  on  a one  year  lease  for  $4,500  per  month  for  the 
OS  version  or  $3,000  per  month  for  the  DOS  version.  A five  year  lease  is 
available  for  $2,500  per  month  for  the  OS  version  or  $1,667  per  month  for  the 
DOS  version. 
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• Maintenance  charges  for  a purchased  system  are  $6,000  per  year. 

• These  prices  are  intended  to  be  indicative  only  and  may  vary  depending  on  the 
country  and  situation.  For  example,  multiple  CPU  discounts  are  provided. 

b.  Real  Memory 

• ADABAS  has  a working  set  of  approximately  I20K  for  the  nucleus,  and 
approximately  38K  for  dynamic  buffers. 

• Each  additional  user  requires  IK  for  an  interface  within  the  user  partition. 

• TP  users  share  one  I K interface. 

• A multi  thread  interface  is  available  that  requires  an  additional  8K  and  can  be 
used  either  on-line  or  in  a batch  environment. 

c.  Performance  Constraints 

• ADABAS  is  available  in  either  a single  thread  or  multi-thread  environment. 
The  multi-thread  support  provided  in  the  latest  version  of  ADABAS  is  likely  to 
be  most  generally  used. 

• Exclusive  Control  lockout  occurs  at  the  data  record  level,  both  across 
partitions  and  within  the  same  partition. 

• Buffer  management  is  provided  from  a common  pool  with  the  most  recently 
used  records  being  maintained  in  that  pool. 

• ADABAS  groups  all  fields  for  a data  record  within  the  same  physical  block. 
There  is  a maximum  of  99  repeating  group  fields  per  data  record  and  a 
maximum  of  16,777,216  records  per  file.  Up  to  255  files  are  supported  per 
data  base. 
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No  support  is  provided  for  direct  relationships  between  files,  but  instead  this 
relationship  is  established  by  coupling  files  using  common  descriptor  fields  in 
the  Associator. 

ADABAS  uses  an  Inverted  List,  which  enables  efficient  indexed  access  to 
records. 

This  is  done  on  the  basis  of  the  IBM  BDAM  access  method.  Direct 
access  to  a data  record  is  possible  in  ADABAS  version  4.1  through  the 
use  of  ADAM,  an  ADABAS  access  method,  without  first  accessing  the 
Associator  for  the  relevant  descriptor. 

ADAM  converts  a key  directly  to  an  Internal  Sequence  Number,  through 
an  ADABAS  randomizing  algorithm.  This  ISN  is  then  converted  to  a 
physical  block  address  by  the  Address  Converter. 

Rapid  access  is  possible  to  all  of  the  records  that  satisfy  quite  complex 
search  criteria  against  simple  data  bases. 

However,  the  counterpart  of  this,  which  applies  relatively  simple  search 
criteria  to  a large  number  of  complex  data  bases,  can  involve  a 
considerable  number  of  physical  accesses  to  the  Associator  for  each  of 
the  descriptors  if  ADAM  is  not  used. 

The  performance  implications  in  this  environment  are  even  greater  when  there 
is  significant  update,  addition  or  deletion  activity  against  data  descriptor 
fields  which  require  maintenance  to  be  carried  out  on  the  Associator  for  each 
description  field  so  updated. 

An  additional  performance  exposure  is  encountered  where  a data  base  requires 
multiple  levels  of  repeating  group  occurrences.  ADABAS  can  support  within  a 
data  record  one  repeating  field,  but  is  unable  within  that  repeating  field  to 
support  additional  repeating  fields,  unless  specified  as  "multiple-valued"  fields. 
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• Additional  files  may  be  defined,  if  necessary,  which  contain  further  levels  of 
repeating  fields.  However,  these  will  require  additional  accesses  to  the 
Associator,  Address  Converter  and  data  record  for  those  related  records  which 
would  of  course  be  "coupled"  by  ADABAS.  There  is  no  provision  for  a direct 
pointer  from  one  data  record  to  another  data  record  which  would  offer 
optimum  performance  in  this  environment. 
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IV  B.  DMS  II  (Burroughs) 


B.  DMS  II  (BURROUGHS) 


I.  GENERAL  DESCRIPTION 

• DMS  II  (Data  Management  System  II)  supports  the  Burroughs  6700  and  7700 
series  and  is  an  integral  part  of  the  Burroughs  operating  system  (MCP).  It 
offers  both  a Data  Base  capability  and  Data  Communications  Support  and 
supports  hierarchies,  together  with  Networks  and  Inverted  Files. 

• DMS  II  supports  a number  of  physical  organizations  which  include: 

Sequential 
Indexed  Sequential 
Random 

Indexed  Random 

• DMS  II  also  supports  unordered  and  ordered  list  structures. 

• An  additional  physical  organization  supported  is  that  of  a bit  vector  set.  This 
set  contains  one  bit  for  each  record  in  the  set. 

The  bit  is  on  if  the  record  meets  the  set  criteria  and  off  if  it  does  not. 

Facilities  are  provided  to  generate  a bit  vector  set  which  subsequently 
can  be  used  for  rapid  searching  of  the  data  base  for  particular  field 
conditions. 

• Basically  the  physical  structuring  of  the  data  base  is  similar  to  a chained  file 
scheme  with  master  data  stored  in  one  data  set  and  a separate  data  set  for  an 
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index  into  that  master  data.  Separate  data  sets  are  used  for  each  repeating 
group  type  and  allow  records  to  be  chained  together. 


o 


Data  can  be  stored  using  either  standard  "sequential"  or  unordered  data  set 
physical  structuring. 

DMS  II  enables  sets  to  be  defined  which  are  used  as  indexes  into  specific  data 
base  records.  Data  in  these  sets  (indices)  can  use: 

Unordered  list. 

Ordered  list. 


Index  sequential. 


Bit  vector. 
Indexed  random. 


• The  bit  vector  organization  can  be  used  to  index  into  master  records  and 
contains  bit  switches  that  have  been  generated  to  reflect  particular  field 
content  of  the  master  records. 


A bit  vector  can  be  generated  by  an  application  program  command  and 
can  be  used  subsequently  for  rapid  selection  of  records  that  meet 
particular  criteria. 

• Normally  the  index  random  and  index  sequential  organizations  are  used  to 
index  into  master  records.  The  unordered  and  ordered  list  segregations  are 
normally  used  for  embedded  data  sets  (i.e.,  chains). 
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The  DMS  II  method  of  relating  master  records  to  dependent  records  is 
not  based  on  chain  pointers  from  one  chain  record  to  the  next,  but 
rather  on  the  set  which  can  be  thought  of  as  a table  which  points  to 
dependent  records. 

This  table  itself  may  be  made  up  of  multiple  records  that  are 
themselves  chained  together. 

a.  Data  Relationships 

• DMS  II  supports  data  interrelationships  with  unidirectional  pointers.  Thus,  two 
relationships  must  be  defined  for  a by-directional  relationship. 

• DMS  II  supports  a number  of  types  of  relationships.  These  are: 

Counted:  Keeps  a count  in  the  target  of  the  links  pointing  to  it. 

Symbolic:  Contains  the  symbolic  key  of  the  related  record. 

Verified  Link:  Contains  a verification  value  together  with  direct 

address  pointers. 

Self-Correcting  Link:  Contains  a symbolic  key  and  a direct  address 

pointer.  The  pointer  is  automatically  corrected  when  it  is  found  to  be 
wrong,  by  comparing  the  key  in  the  link  with  the  key  in  the  record 
pointed  to  by  the  direct  address  pointer. 

Unprotected  Link:  Uses  a disk  address  with  no  correction  and  no 

verification.  If  the  data  pointed  to  is  reorganized  then  it  could  point  to 
the  wrong  record. 

• The  above  types  of  relationships  provide  the  Data  3ase  Administrator  flexi- 
bility in  controlling  the  automatic  updating  of  pointers. 


c 
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For  example,  the  self-correcting  link  enables  the  pointer  to  be  updated 
automatically  on  the  first  reference  to  a target  record,  after  that 
record  has  moved  in  the  data  base. 


9 


However,  the  unprotected  link  provides  no  correction  facility  in  the 
event  of  data  movement. 


• DMS  II  supports  a secondary  indexing  capability  which  is  referred  to  as  a "set." 
Multiple  sets  are  allowed  to  provide  different  access  paths  to  the  same  record 
based  on  different  fields,  combinations  of  fields,  or  conditions  of  fields  (i.e., 
salaries  greater  than  $10,000). 


• Sets  can  be  both  automatic  (system  controlled  and  maintained)  and  manual 
(controlled  and  maintained  by  the  programmer).  These  manual  sets  can 
contain  data  from  the  target  record. 


They  are  referred  to  then  as  subsets  and  enable  the  application  program 
to  retrieve  specific  information  from  the  subset  rather  than  accessing 
the  data  base  directly. 


• While  sets  and  subsets  can  be  used  to  access  the  data  base  in  a sequence 
different  from  that  in  which  it  is  physically  maintained,  it  appears  that  access 
is  provided  only  to  the  relevant  dependent  record. 

The  capability  to  progress  from  this  record  up  the  data  structure  to 
higher  level  records  and  so  produce  an  inverted  list  does  not  appear  to 
be  provided. 


• DMS  II  data  bases  support  variable  length  records  and  automatically  compact 
records  for  best  utilization  of  disk  storage.  If  the  application  program  adds 
information  to  the  data  base  in  different  formats,  DMS  II  will  automatically 
compact  the  data. 
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• The  application  program  sees  an  Internal  view  of  the  data  base.  An  External 
or  Conceptual  view  is  not  provided.  However,  the  Data  Base  Administrator 
can  specify  that  an  application  program  is  only  a subset  of  the  full  data  base. 
Additionally,  a further  level  of  control  is  provided  by  permitting  an  application 
program  to  open  a data  base  for  read  only,  read/write  or  write  only  access. 

b.  Data  Definition  Language 

• The  DASDL  language  is  an  easy-to-use,  free-form  language  that  allows  the 
Data  Base  Administrator  to  describe: 

Data  and  its  characteristics. 

The  properties  of  information  in  the  data  base. 

Multiple  ways  of  retrieving  information. 

Security  of  information. 

Relationships  between  data  in  the  data  base. 

• The  Data  3ase  Administrator  describes  the  basic  data  elements  in  the  data 
base. 

• The  Data  Base  Administrator  may  also  define  and  control  the  verification  of 
information  in  the  data  base. 

• The  DASDL  permits  the  Data  3ase  Administrator  easily  to  verify  the  value  of 
information  in  the  data  base. 

• In  addition  to  describing  the  data  base  and  its  properties,  the  Data  Base 
Administrator  describes  how  sets  of  information  are  used  by  applications  and 
systems  for  efficient  retrieval  of  information. 
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The  Data  Base  Administrator  describes  the  sets  that  allow  access  to  the  data. 
The  ONLINE  set  (described  below)  will  fulfill  the  requirements  for  the 
personnel  system. 


9 


ONLINE  SET  OF  PERSONNEL  KEY  IS  (NAME,  STREET) 


Access  is  permitted  to  the  set  of  personnel  information  through  the  keys 
NAME  and  STREET. 


• The  Data  Base  Administrator  may  also  specify  a variety  of  retrieval  methods 
for  efficient  access  to  the  data  base.  For  example,  assume  that  the  most 
efficient  retrieval  of  personnel  information  is  achieved  through  an  index 
random  access  method,  and  the  most  efficient  retrieval  of  information  to  the 
accounting  system  is  achieved  by  using  an  index  sequential  access  method. 
The  DASDL  to  describe  this  is: 


PERSONNEL  DATA 

(NAME... 
CITY... 
STATE... 
EMPLOYEE... 
); 


ONLINE  SET  OF  PERSONNEL  KEY  IS  (NAME,  STREET)  INDEX  RAN- 
DOM; 


ACCOUNTING  SET  OF  PERSONNEL  KEY  IS  EMPLOYEE  INDEX 
SEQUENTIAL; 


If  an  application  program  adds  personnel  information  to  the  data  base  using 
the  ONLINE  set,  the  information  can  be  retrieved  using  the  ONLINE  set  or 
using  the  ACCOUNTING  set. 
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• For  those  applications  where  access  to  a selected  subset  of  information  is 
required,  the  Data  Base  Administrator  can  describe  subsets  or  partial  collec- 
tions of  information. 

This  allows  a member  of  a subset  of  information  to  be  retrieved  without 
retrieving  the  entire  set.  The  subset  then  becomes  a secondary  index  to 
the  information  which  can  contain  data  from  the  target  record. 

This  can  be  very  useful  for  reporting  upon  subsets  of  information. 

PERSONNEL  DATA 


(NAME... 

STREET... 

CITY... 

STATE... 
EMPLOYEE... 
EMPLOYEE-CODE... 
); 


ONLINE  SET  OF  PERSONNEL  KEY  IS  (NAME,  STREET)  INDEX  RAN- 
DOM; 


ACCOUNTING  SET  OF  PERSONNEL  KEY  IS  EMPLOYEE  INDEX 
SEQUENTIAL; 

MANUF  SUBSET  OF  PERSONNEL  WHERE  EMPLOYEE-CODE  = 1030 
KEY  IS  NAME  INDEX  SEQUENTIAL; 


MANUF  allows  the  respective  employees  to  be  retrieved  randomly  or  sequen- 
tially by  name.  Since  DMS  II  automatically  builds  an  index  to  the  employees 
whose  employee  code  is  1030,  the  entire  Personnel  file  does  not  have  to  be 
read. 
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DMS  II  provides  a variety  of  data  base  retrieval  methods.  Information  can  be 
retrieved  using  unordered,  bit  vectors,  index  sequential,  index  random  and 
random  access  methods. 

The  index  sequential  access  method  allows  information  to  be  retrieved 
sequentially  or  randomly.  If  information  is  retrieved  sequentially,  it  is 
retrieved  in  the  sequence  defined  by  the  Data  Base  Administrator. 
Index  sequential  organization  is  useful  for  applications  where  informa- 
tion must  be  frequently  retrieved  in  a specific  order  (report)  and  also 
retrieved  randomly.  Typically,  the  index  sequential  access  method  is 
used  when  the  information  is  stable  and  has  relatively  few  additions  or 
deletions. 

The  index  random  access  method  allows  information  to  be  retrieved 
randomly.  Index  random  uses  less  disk  storage  than  a random  access 
method  where  the  randomizing  algorithms  provide  an  even  distribution 
across  the  data  base.  Index  random  is  used  if  data  does  not  have  to  be 
retrieved  in  an  ordered  manner  or  if  data  is  volatile. 

The  random  access  method  does  not  use  a table  or  an  index  like  index 
sequential  or  index  random  to  retrieve  information.  The  random 
organizational  method  uses  a randomizing  algorithm  for  quick  access  to 
the  desired  record. 

DMS  II  allows  a bit  index  to  be  created  from  the  data  for  rapid  mass  retrieval 
of  information. 

The  bit  index  (bit  vector)  contains  boolean  information  that  meets  the 
Data  Base  Adminstrator's  defined  conditions. 

Since  bit  indexes  require  small  amounts  of  disk  storage,  a scan  of  the 
index  can  be  done  very  quickly. 
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Bit  indexes  are  most  valuable  for  applications  where  the  entire  file  of 
information  must  be  scanned  to  satisfy  a request. 

DMS  II  does  not  automatically  update  bit  indexes  when  the  data  in  the  relevant 
records  changes.  Provision  is  made  for  the  application  program  to  regenerate 
the  bit  index  to  ensure  it  reflects  the  current  status  of  all  records  in  the  data 
base.  The  volatility  of  the  data  base  determines  the  frequency  of  regenerating 
this  index. 

DMS  II  allows  the  Data  Base  Administrator  to  relate  information  between  two 
or  more  members  of  the  data  base. 

An  example  of  a relationship  between  two  members  is  a data  base 

containing  customer  shipping  information  and  customer  billing  informa- 
tion. 

Assume  that  60%  of  the  time  when  a customer's  shipping  information  is 
retrieved,  it  is  necessary  to  know  something  about  billing  location. 

The  Data  Base  Administrator  can  define  the  relationship  between 
customer  shipping  and  billing  information  using  DASDL. 

Then  when  an  application  retrieves  customer  shipping  data,  billing 
information  may  also  be  optionally  retrieved. 


DMS  will  place  a pointer  in  the  customer's  shipping  information  to  the 
customer  billing  data  which  can  be  used  to  retrieve  the  related 
customer  billing  data  in  one  or  more  disk  accesses. 


Relationships  may  be  expressed  between  several  members  of  information  in 
the  data  base  or  one  member  of  information  can  have  several  members  of 
information  related  to  it. 
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As  an  example,  a manufacturing  firm  has  parts  information  and 
assembled  components  information  in  its  data  base. 

There  is  a relationship  between  an  assembled  component  and  parts  that 
make  up  that  component. 

The  Data  Base  Administrator  can  describe  the  relationship  between  the 
parts  and  components  using  DASDL. 

Then  when  an  application  program  retrieves  component  information, 
optionally,  the  parts  information  making  up  the  component  can  be 
retrieved. 

DMS  II  uses  these  relationships  as  the  basis  for  the  formation  of  hierarchies 
and  networks.  Further,  multiple  sets  (indexes)  to  information  can  be  defined 
at  any  level  or  mode  of  the  hierarchy  or  network. 

c.  Link  Relationships 

Relationships  that  relate  to  members  of  information  are  referred  to  as  link 
relationships  in  DMS  II.  Link  relationships  allow  DMS  II  to  retrieve  the  related 
member  of  information  generally  in  one  or  more  accesses  depending  on  the 
link  path. 

DMS  II  provides  five  link  relationships: 

The  unprotected  link  can  be  used  when  information  is  very  stable. 

The  verified  link  is  more  powerful  than  the  unprotected  link,  because 
DMS  II  always  verifies  the  validity  of  the  link  relationship  prior  to 
passing  the  related  information  to  an  application. 

The  self-correcting  link  is  more  powerful  than  the  verified  link  since  it 
verifies  the  validity  of  the  relationship  and  tries  to  correct  the 
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relationship.  The  self-correcting  link  is  extremely  valuable  if  the 
related  information  is  very  volatile. 


The  symbolic  link  is  used  for  relating  extremely  volatile  information. 
Use  of  the  symbolic  link  assumes  that  the  related  information  is 
continually  removed  and  added  to  the  data  base. 

d.  Data  Manipulation  Language 

DMS  allows  information  to  be  retrieved  from  the  data  base  using  higher  level 
languages  such  as  COBOL  or  ALGOL. 

The  application  programmer  is  not  responsible  for  the  definition  of  data  or 
files.  The  Data  Base  Administrator  defines  the  data  and  its  characteristics 
using  the  DASDL. 

There  is  no  requirement  for  the  application  programmer  to  redefine  the 
data. 

Additionally,  the  Data  Base  Administrator  can  define  only  part  of  the 
data  base  to  be  available  to  the  application  program. 

When  the  COBOL  compiler  compiles  the  program,  the  compiler  auto- 
matically places  a description  of  the  data  (as  described  by  the  Data 
Base  Administrator  using  DASDL)  into  the  application  program. 

The  programmer  uses  self-identitying,  easy-to-use  words  such  as  FIND, 
STORE,  LOCK  or  MODIFY  to  retrieve  and  update  information. 

These  verbs  are  structured  around  COBOL  syntax,  and  are  translated  by 
the  COBOL  compiler  into  the  necessary  calls  to  request  information 
from  the  data  base. 
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2. 


BASIC  FUNCTIONAL  CAPABILITIES 
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a.  Easy  Accessibility 

• DMS  II  does  not  support  a machine-oriented  language  but  uses  COBOL  as  its 
commercial  language  and  ALGOL  as  its  scientific  language. 

• DMS  II  is  structured  around  a COBOL  syntax.  The  additional  verbs  enable  a 
COBOL  application  language  to  be  quite  readable  and  understandable. 

• DMS  II  provides  an  integrated  Data  Communications  support. 

b.  Multiple  Views  of  Data 


• DMS  II  supports  a sequential,  random,  and  indexed  retrieval.  Additionally, 
multiple  indices  (sets)  may  be  defined  to  permit  access  to  the  data  base  in 
several  different  sequences. 

c.  Data  Consolidation 


• DMS  II  has  no  limit  on  the  number  of  repeating  group  types  supported  per 
entity  (data  base  record).  However,  each  separate  repeating  group  type 
resides  in  a different  data  set  and  consequently  requires  a separate  I/O  access. 

• The  number  of  occurrences  per  repeating  group  type  is  unlimited  and  resides  in 
the  data  set  for  that  repeating  group. 


• Variable  length  repeating  group  occurrences  are  supported  together  with  data 
compaction. 


• Relationships  between  entities  (records)  are  uni-directional  and  are  initially 
created  by  the  programmer.  They  can  be  optionally  maintained  by  DMS  II  and 
may  use  pointers  based  either  on  symbolic  keys  or  direct  addresses. 

Self-correcting  pointers  containing  a combination  of  symbolic  keys  and 
direct  access  pointers  can  be  used. 
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In  the  event  that  the  related  record  has  moved  and  the  direct  access 
pointer  no  longer  points  directly  to  the  correct  record,  the  symbolic  key 
is  used  to  retrieve  the  record  and  the  direct  access  pointer  is  then 
corrected. 

There  is  no  limit  to  the  number  of  relationships  that  can  be  established  per 
data  base.  It  is  not  known  whether  any  restriction  is  placed  on  the  number  of 
relationships  per  entity  (record)  or  program. 

DATA  INDEPENDENCE 

a.  Levels  of  Mapping 

DMS  II  supports  only  one  level  of  mapping,  an  Internal  level.  The  Conceptual 
or  External  levels  of  mapping  are  not  supported. 

As  field  definition  is  also  not  supported  by  DMS  II,  the  application  program 
views  the  physical  data  storage  structure  directly.  While  the  Data  Base 
Administrator  can  define  only  a subset  of  the  physical  data  base  to  be  viewed 
by  the  application  program,  he  is  somewhat  constrained  in  the  extent  to  which 
he  can  restructure  the  data  base. 

b.  Data  Base  Changes 

Change  Device  Type:  While  the  data  base  must  be  reloaded,  programs  do  not 
have  to  be  changed  or  recompiled. 

Change  Access  Method:  A change  of  access  method  requires  reloading  of  the 
data  base.  It  is  not  known  whether  this  will  also  require  a change  in  program 
logic  or  program  recompilation. 

Change  Entity  View;  A change  in  the  way  in  which  a record  is  viewed  will 
require  reloading  of  the  data  base  together  with  program  recompilation.  It  is 
not  known  whether  it  will  also  require  a change  in  program  logic. 
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• Add  New  Entity:  No  change  is  necessary  in  program  logic,  recompilation  or 

data  base  reloading  to  add  a new  record  to  the  data  base. 

• Add  New  Repeating  Group  Type:  Because  a new  repeating  group  type  to  be 
added  to  the  data  base  must  reside  in  a separate  data  set,  this  will  require  a 
change  in  program  logic,  program  recompilation  and  data  base  reloading. 

• Add  New  Relationship:  Similarly,  the  addition  of  a new  relationship  between 
records  or  parts  of  the  data  base  will  require  a change  in  program  logic, 
program  recompilation  and  data  base  reloading,  as  the  application  program 
must  refer  to  the  appropriate  relationship  by  name. 

• Add  New  Field  to  Repeating  Group:  The  addition  of  a new  field  to  a repeating 
group  will  require  data  base  reloading,  program  recompilation,  and  may 
require  a change  in  program  logic  depending  upon  the  way  in  which  the 
program  has  been  written. 

• Change  Field  Format:  A change  in  field  format  will  require  data  base 

reloading  and  program  recompilation.  It  will  not  require  a change  in  program 
logic. 

4.  DATA  INTEGRITY 

a.  Exclusive  Control 


• DMS  II  maintains  Exclusive  Control  at  the  physical  record  (block)  level.  It 
does  not  provide  support  for  program  isolation. 

• The  detection  and  resolution  of  deadlocks  is  provided  solely  by  exclusive 
control.  However,  it  is  believed  that  no  facility  is  provided  to  detect  two 
programs  both  attempting  to  update  two  records  but  not  in  the  same  sequence. 
While  the  system  maintains  exclusive  control,  the  programmer  must  detect 
and  resolve  these  deadlocks. 
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5. 


RECOVERY/RESTART 


a.  Recovery 


• DMS  II  automatically  logs  after  images  for  recovery  and  provides  a Copy/Re- 
store utility  to  produce  a backup  copy  of  the  data  base.  This  backup  copy  can 
be  produced  concurrently  with  normal  processing. 

• No  provision  is  made  for  log  summarization.  A different  audit  tape  (log  tape) 
is  used  for  each  data  base. 

• The  smallest  recoverable  unit  is  the  row.  The  row  is  some  multiple  of  blocks 
and  both  on-line  dumping  (i.e.,  copying)  and  reconstruction  of  the  data  base 
can  proceed  concurrently  while  the  data  base  is  in  use  by  programs.  When  a 
bad  row  is  discovered  (i.e.,  an  I/O  error  occurs),  only  that  row  is  marked  as 
unavailable  and  the  remainder  of  the  data  base  can  still  be  used. 

b.  Batch  Restart 

• While  logging  before  images  for  batch  restart  is  provided  by  DMS  II,  the 
programmer  must  code  the  restart  procedures  in  his  application  program. 

• Utility  support  is  provided  to  place  a tape  mark  on  the  audit  tape,  but  no 
provision  is  made  to  extract  data  from  the  data  base  buffer  after  system 
failure  to  close  out  the  log  tape. 

• Intermediate  restart  points  can  be  defined  for  batch  programs  so  that  the 
backout  may  proceed  to  that  point  and  then  the  program  may  be  restarted. 
The  application  program  must  code  and  control  the  reprocessing  from  that 
point. 
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c. 


On-Line  Restart 


a 


DMS  il  provides  support  for  message  logging.  However,  the  ability  for  it  to 
synchronize  the  message  log  with  the  data  base  log  is  not  known.  For 
example,  if  messages  are  logged  to  a different  tape  than  the  data  base,  the 
ability  to  synchronize  message  and  data  base  logging  is  severely  compromised. 

No  support  is  provided  for  task  restart  by  DMS  II.  The  application  programmer 
is  involved  in  the  restart. 

However,  system  restart  is  supported  at  Halt/Load  time  when  DMS  II  is 
reinitialized. 


6.  DATA  SECURITY 


DMS  II  application  programs  are  tied  to  the  data  base  at  compile  time  and 
specify  the  particular  parts  of  the  data  base  that  they  are  to  access. 
Consequently,  the  decision  as  to  accessibility  of  information  to  the  application 
program  is  made  at  compile  time  rather  than  execution  time. 


A security  key  enables  DMS  II  to  define  the  level  of  access  permitted  for  on- 
line or  batch  users  of  the  system.  Only  those  users  with  the  appropriate 
authority  are  permitted  to  access  relevant  data  bases. 

However,  once  access  is  granted,  the  application  program  is  able  to 
reference  any  data  within  the  data  base. 


DMS  II  maintains  a security  restriction  level  in  terms  of  the  file  or  item.  As 
each  repeating  group  type  resides  in  a separate  file,  access  restriction  is 
provided  down  to  the  repeating  group  type.  The  access  options  supported  are 
either  retrieve  or  update. 


The  programmer  is  involved  in  the  data  security  enforceability  and  must 
specify  the  access  option  in  the  OPEN  verb. 
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7. 


EASE  OF  USE 


a.  Data  Base  Administrator 


• The  Data  Definition  Language  is  a procedural  language  but  as  a by-product 
produces  COBOL  structures  for  the  program. 

• A utility  is  provided  to  print  the  audit  tape.  This  audit  tape  could  be  used  for 
extraction  of  performance  information  that  may  assist  the  design  and 
measurement  process. 

• DMS  II  does  not  provide  a Data  Dictionary  for  the  control  of  data  in  the  data 
base. 

While  DMS  II  provides  for  the  generation  of  COBOL  data  structures  as  a 
by-product  of  the  Data  And  Storage  Definition  Language,  this  is  only  a 
small  part  of  the  total  function  necessary  for  a complete  Data 
Dictionary. 

No  facilities  are  provided  for  control  of  fields,  repeating  groups,  data 
bases,  programs  or  applications  through  reports  and  cross  referencing. 

• Externals  education  is  provided,  together  with  reference  documentation. 
However,  the  information  available  relating  to  internals  is  very  limited. 

b.  Application  Programmer 

• The  application  programmer  uses  a procedural  Data  Manipulation  Language 
that  is  based  on  the  COBOL  syntax  but  with  additional  verbs  giving  a total  of 
twenty  data  base  operations. 

• Only  one  record  type  can  be  retrieved  per  command. 
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Facilities  are  provided  for  searching  the  data  base  using  indexed  (key)  fields 
only.  Provision  is  supported  for  high,  low  and  equal  searches,  together  with 
boolean  search  capability. 

c.  End  User 


Facilities  are  provided  for  end  user  language  support  and  access  to  a DMS  II 
data  base. 

COST/PERFORMANCE 

a.  Measurable  Costs 


DMS  II  is  available  for  purchase  at  $70,000  or  an  unlimited  lease  is  available 
costing  $6,400  per  annum.  In  both  cases,  an  annual  maintenance  charge  of 
$7,000  applies.  Limited  lease  plans  are  available  for  monthly  license  fees  of 
$1,667  for  a three  year  lease  or  $1,600  for  a five  year  lease.  These  prices  are 
indicative  only  and  may  vary. 

b.  Real  Memory 


The  real  memory  required  is  approximately  200K  bytes  for  the  first  user,  plus 
the  MCP  and  additional  requirements  which  may  exist. 

c.  Performance  Constraints 

DMS  II  has  multithread  capability  with  a lock  out  level  at  the  physical  block. 

A common  buffer  pool  is  used,  organized  by  data  base. 

One  repeating  group  type  is  stored  per  data  set.  This  possibly  has  the  greatest 
impact  on  performance  as  the  number  of  repeating  group  types  in  the  data 
base  increases. 
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Each  additional  repeating  group  type  will  require  I/O  access  to  a 
separate  data  set. 

However,  there  is  no  limit  to  the  number  of  data  sets  per  data  entity 
(data  base  record)  that  are  supported. 

• DMS  II  uses  both  direct  relationships  as  well  as  symbolic  relationships. 


-125- 


© 1978  by  INPUT,  Menlo  Park,  CA  94025.  Reproduction  Prohibited. 


INPUT 


9 


-126- 


© 1978  by  INPUT,  Menlo  Park,  CA  94025.  Reproduction  Prohibited. 


INPUT 


IV  C.  DMS  170  (Control  Data) 


C.  PMS  170  (CONTROL  DATA) 


I.  GENERAL  DESCRIPTION 

• DMS  170  is  the  data  base  system  supported  by  Control  Data  Corporation 
(CDC)  for  the  CYBER  170  series  computers  (CYBER  70  Models  72,  73,  74)  and 
the  6000  series  computer  systems.  DMS  170  is  based  on  the  CODASYL  Data 
Base  Task  Group  specifications. 

• DMS  170  Version  2 contains  a very  complete  Data  Definition  Language  (DDL) 
implementation  of  the  CODASYL  specifications.  However,  it  does  differ  quite 
markedly  from  the  complete  CODASYL  specifications  in  that  set  relationships 
are  implemented  in  a more  limited  fashion  in  DMS  170. 

• The  Data  Manipulation  Language  (DML)  of  CODASYL  is  not  utilized  in  DMS 
170.  Instead,  DMS  170  uses  standard  COBOL  verbs  to  access  the  Data  Base. 

a.  CYBER  Record  Manager 

• DMS  170  is  built  on  a modular  approach  to  Data  Base  management.  DMS  170 
enables  an  installation  to  select  the  appropriate  system  module  combination  to 
fit  its  particular  needs. 

• As  a first  step  in  instituting  this  building-block  approach,  Control  Data 
implemented  the  CYBER  Record  for  many  standard  higher-level  software 
products.  Products  designed  specifically  for  the  Data  3ase  environment  use 
CYBER  Record  Manager  either  to  carry  out  their  individual  input/output  tasks 
or  to  standardize  file  structures  among  products. 

QUERY  UPDATE,  the  conversational  language. 

DDL,  the  Data  Description  Language. 
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CDS,  CYBER  Database  Control  System,  the  controlling  module. 


• A file  created  through  CYBER  Record  Manager  can  be  accessed  by  any  other 
product  that  requires  the  created  file. 

• The  second  step  toward  modular  data  management  technology  is  provided  by 
QUERY  UPDATE,  a flexible  language  with  the  capabilities  for  users  of  widely 
varying  levels  of  computer  competence. 

Experienced  programmers  can  use  QUERY  UPDATE  capabilities  - such 
as  using  arithmetic  expressions  in  complex  procedures,  organizing  data 
into  arrays,  searching  interactively  for  individual  records  or  record 
elements,  and  formating  complex  reports. 

Key  management  personnel  and  non-technical  employees  can  man- 
ipulate files  through  QUERY  UPDATE  by  following  procedures  estab- 
lished by  more  experienced  users. 

• The  third  step  in  the  modular  approach  to  data  management  adds  two  products 
- Data  Description  Language  (DDL)  and  CYBER  Data  Base  Control  System 
(CDCS)  - to  form  a single  point  of  data  integration. 

• The  DDL  enables  those  responsible  for  the  overall  design  of  the  data  base  to 
structure  and  define  data.  The  physical  creation  of  the  data  base  and  its 
subsequent  updating  and  integration,  however,  are  handled  by  CYBER  Record 
Manager,  in  conjunction  with  another  language  such  as  COBOL  or  FORTRAN 
Extended. 

• Instead  of  actually  creating  or  accessing  a file,  DDL  describes  the  relation- 
ships between  all  information  items  stored  in  the  data  base  files.  A logical 
file  description  in  one  program  can  be  reconciled  with  descriptions  in  other 
programs  of  the  physical  record  description  of  the  actual  file. 
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• COBOL  and  QUERY  UPDATE  are  equipped  to  integrate  DDL  descriptions  with 
user  compilations  (COBOL)  or  execution  (QUERY  UPDATE).  Once  the  data  is 
defined  with  DDL  and  a COBOL  program  has  been  called  to  interact  with  the 
file,  data  base  creation,  integration  and  updating  are  carried  out  under  control 
of  CDCS  (CYBER  Data  Base  Control  System). 

• Recognizing  the  individual  record  descriptions  generated  by  DDL,  CDCS 
maintains  the  physical  representation  of  information  in  the  data  base  and 
converts  data  according  to  the  individual  user's  program  needs.  CDCS  also 
functions  as  the  data  base  controller  from  an  installation  management 
standpoint.  Because  of  its  centralized  structure,  CDCS  acts  as  a software 
monitor  to  protect  data  integrity  and  ensure  file  security. 

b.  CYBER  Data  Base  Control  System 

• CDCS  acts  as  a centralized  software  monitor  to  control  and  interpret  data 
base  access  requests  from  application  programs  (see  Exhibit  IV-CI). 

CDCS  ensures  data  integrity  by  preventing  incompatible  uses  of  data  by 
different  applications  and  makes  these  applications  more  convenient  by 
translating  various  individual  input/output  data  description  formats  to 
compatible  terminology. 

CDCS  conforms  in  large  measure  to  the  CODAS YL  Data  Base  Task 
Group  specifications. 

• CDCS  works  with  CYBER  Record  Manager  by  accepting  calls  from  COBOL 
batch  programs.  CDCS  is,  for  all  practical  purposes,  transparent  to  the 
COBOL  programmer.  Programs  need  not  be  changed  substantially  since 
conventional  COBOL  verbs  are  used  in  accessing  files.  This  represents  the 
greatest  divergence  of  DMS  170  from  the  CODASYL  specifications  - the 
CODASYL  Data  Manipulation  Language  commands  are  not  supported  by  DMS 
170. 
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EXHIBIT  IV-C1 


9 


DMS  170  CONTROL  SYSTEM  ORGANIZATION 
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CDCS  integrates  compiled  and  disk -stored  schemas  and  corresponding  sub- 
schemas previously  prepared  through  DDL  compiler  processing.  The  schema 
description  determines  what  is  required  of  CYBER  Record  Manager. 

Distinct  from  the  comments  regarding  the  DML  above,  the  DDL  is  imple- 
mented in  DMS  170  provides  a very  high  level  of  compatability  with  the 
CODASYL  DDL  specifications. 

Because  of  the  modularity  of  Control  Data's  data  base  software  design, 
however,  CDCS  is  not  required  as  a centralized  control  module  for  all  data 
base  applications  at  an  installation.  If  it  is  more  convenient,  CDCS  can  be 
used  as  a utility  to  aid  the  data  base  administrator  or  by-passed  completely. 

However,  this  flexibility  and  modularity,  while  advantageous  in  permitting 
conversion  to  a completely  centralized  system,  raises  the  question  of  data 
integrity  and  data  security  of  the  data  base  since  it  is  possible  for  access  and 
modification  to  the  data  base  to  be  made  outside  the  control  of  CDCS. 

c.  Relational  Data  Bases 

The  Relational  data  base  facility  of  DMS  170  allows  COBOL  users  to  retrieve 
data  from  severa  files,  joined  together  logically  in  a structure  called  a 
relation.  This  capability  of  DMS  1 70  does  not  imply  a Relational  Data  Base  (as 
is  the  subject  of  much  research  at  present)  but  instead  identifies  two  files  in  a 
relation  entry  which  are  to  be  joined  for  reference  by  CDCS  and  for  selection 
of  record  occurrences  through  the  use  of  the  schema  and  sub-schema  direc- 
tories. 

It  is  at  the  level  of  the  RELATION  entry  of  DMS  170  that  most  divergence 
from  the  CODASYL  DDL  specifications  occur.  DMS  170  does  not  support  the 
SET  relationships  of  the  CODASYL  DDL.  The  DMS  170  RELATION  entry  only 
indentifies  two  files  which  are  logically  related  through  a common  key  or 
control  field. 
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• It  is  not  possible  to  specify  any  SET  OCCURRENCE  SELECTION  modes  or  SET 
ORDER  modes  as  in  the  case  the  CODAS YL  SET  Entry.  The  full  power  of 
CODAS YL  Data  Base  technology  is  therefore  unavailable  to  users  of  DMS  170 
as  is  the  lack  of  CODASYL  DML  command  support  in  DMS  170  COBOL 
Programs. 

d.  Data  Description  Language  (DDL) 

• The  DMS  170  DDL  is  very  high-level  implementation  of  the  CODASYL  DDL 
specifications.  DMS  170  DDL  supports  the  CODASYL  schema  and  sub-schema 
specifications. 

• The  schema  is  a detailed  description  of  the  entire  data  base.  The  description 
is  generated  by  DDL  statements  that: 

Name  the  schema. 

Organize  the  schema  into  addressable  storage  units  (files)  called  areas. 

Describe  each  type  of  record  together  with  the  characteristics  of  the 

data  comprising  the  record. 

Join  files  in  relationships. 

• The  DDL  statements  are  used  as  input  to  the  DDL  compiler  which  produces  an 
object  schema  or  schema  directory. 

• The  system  uses  the  schema  directory  to  relate  an  application  program's 
symbolic  references  to  the  actual  data  in  the  data  base.  Only  one  schema 
exists  for  a data  base. 

• The  sub-schema  is  a detailed  description  of  the  portion  of  the  data  base  that  is 
available  to  an  application  program.  The  description  is  generated  by  DDL 
statements  that: 
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Identify  the  schema  when  applicable. 


Name  the  sub-schema. 

Specify  the  areas  needed. 

Define  the  content  and  structure  of  the  applicable  records. 

Indicate  any  changes  in  data  format  required  by  the  application 
program. 

• This  format  conversion  of  data  stored  on  the  data  base  before  presentation  to 
the  application  program  represents  a high  level  of  implementation  of  the 
CODASYL  DDL. 

• An  application  program  uses  the  sub-schema  directory  to  obtain  descriptions 
of  applicable  data.  Any  number  of  sub-schemas  can  exist  for  a data  base. 

• CDCS  (CYBER  Data  Base  Control  System)  is  the  DMS  170  controlling  module 
that  monitors  and  interprets  data  base  access  requests  from  application 
programs  that  are  using  the  schema. 

• CDCS  accepts  calls  from  the  application  programs,  interrogates  the  schema 
and  sub-schema  for  compatibility,  translates  data  formats  from  the  programs' 
language  to  the  internal  format  of  the  data,  and  determines  the  requirements 
for  ultimate  input/output  processing. 

• CDCS  checks  for  the  validity  of  data  entering  the  system  if  validity  checking 
is  specified  in  the  schema.  This  implementation  of  the  CODASYL  DDL 
CHECK  clause  represents  another  high-level  DDL  capability  of  DMS  170. 

Data  items  must  conform  to  stated  picture  specifications  and  fall 
within  legal  ranges  when  range  checking  is  specified. 
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Data  base  procedures  incorporated  into  the  schema  directory  can 
provide  for  additional  validity  checking  by  specifying  routines  to  be 
executed  at  specific  points  in  the  CDCS  processing  environment. 

• Data  base  procedures  specified  in  the  schema  are  initiated  by  CDCS.  These 
special  purpose  routines  perform  a variety  of  operations  that  can  include: 

Checking  validity  of  data  items  prior  to  storage. 

Performing  data  conversions  that  are  not  supported  by  the  system. 

Calculating  the  values  of  actual  or  virtual  items  and  deliverying  them 
to  the  user's  working  storage. 

Performing  additional  processing  on  items  during  data  base  retrieval  or 
update. 

Handling  special  error  conditions  detected  within  CDCS. 
e.  DDL  Statements 


• The  SCHEMA  NAME  clause  identifies  the  schema.  Only  one  schema  can  be 
defined  for  one  data  base. 

• The  AREA  NAME  clause  identifies  the  name  to  be  used  to  reference  each  area 
and  indicates  the  data  base  procedures  to  be  CALLED  at  OPEN  or  CLOSE 
time.  The  maximum  number  of  areas  allowed  for  a data  base  is  63.  An  area  is 
a part  of  disk  storage  that  can  be  accessed  in  the  same  manner  as  a file  and  is 
opened  and  closed  by  the  COBOL  OPEN  and  CLOSE  verbs. 

• The  RECORD  DESCRIPTION  ENTRY  identifies  each  RECORD  NAME  and 
indicates  the  area  WITHIN  which  it  resides.  Optionally  a data  base  procedure 
to  be  called  for  STORE,  DELETE,  MODIFY,  FIND  or  GET  operations  can  be 
specified.  (It  should  be  noted  that  these  functions  are  implemented  by  the 
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standard  COBOL  verbs  READ  (for  GET),  and  WRITE  or  REWRITE  verbs  (for 
STORE  or  MODIFY  functions). 


• An  area  can  contain  more  than  one  record  type.  The  maximum  number  of 
record  types  allowed  is  116.  The  maximum  record  types  allowed  in  a data  base 
is  51  I.  A maximum  record  size  of  262,143  characters  is  allowed. 

• The  DATA  DESCRIPTION  ENTRY  identifies  a number  of  elementary  items 
which  together  make  up  the  record.  Elementary  items  can  be  defined  as  part 
of  a repeating  group  item  with  the  elementary  item  as  the  smallest  unit  of 
named  data.  The  maximum  number  of  elementary  items  allowed  is  16,383;  the 
maximum  item  size  is  32,767  characters.  The  maximum  number  of  items  per 
record  is  680. 

• The  only  group  items  that  can  be  specified  in  the  schema  are  repeating  groups. 
Non-repeating  group  items  are  not  supported  in  the  schema.  A repeating 
group  is  a collection  of  related  data  items  organized  in  a hierarchical 
structure;  the  entire  structure  is  repeated  a number  of  times.  Group  items 
can  be  nested  to  three  levels. 

• Repeating  data  items  are  specified  by  including  the  OCCURS  clause  in  the 
Data  Description  Entry.  A repeating  data  item  can  occur  a fixed  number  of 
times  in  each  record,  or  a variable  number  of  times  depending  on  the  value  of 
another  data  item  in  the  record. 

• A data  item  that  is  repeated  a variable  number  of  times  is  described  with  the 
OCCURS  data-name  TIMES  clause.  The  exact  number  of  times  the  data  item 
is  repeated  in  a record  occurrence  depends  on  the  value  of  the  data  item 
referenced  by  data-name  in  the  OCCURS  clause. 

• The  CHECK  is  VALUE  clause  must  be  included  to  designate  the  minimum  and 
maximum  number  of  occurrences  that  are  allowed. 
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The  size  and  class  of  the  data  item  are  designated  by  the  PICTURE  clause  or 
TYPE  clause. 


9 


• The  schema  describes  the  storage  characteristics  of  all  data  items  in  the  data 
base.  The  sub-schema  describes  only  those  data  items  to  be  accessed  by  one 
or  more  application  programs.  Within  certain  limitations,  the  characteristics 
of  the  data  items  can  be  changed  in  the  sub-schema  to  meet  the  requirements 
of  the  application  programs. 

• The  process  of  changing  data  characteristics  between  the  schema  and  the  sub- 
schema is  called  conversion.  Conversion  occurs  during  record  mapping  which 
is  the  CDCS  operation  for  generating  a record  image.  Some  of  the  conversions 
between  the  schema  and  the  sub-schema  are: 

Coded  arithmetic  to  numeric  picture. 

Numeric  picture  to  coded  arithmetic. 

9 

Coded  arithmetic  to  coded  arithmetic. 

Numeric  picture  to  numeric  picture. 

• Record  mapping  is  performed  by  CDCS  whenever  a user  function  reads  or 
writes  a data  base  record.  The  schema  and  sub-schema  record  descriptions  are 
used  to  generate  a record  image.  Each  data  item  is  transferred  from  the 
source  record  to  the  target  record.  Conversion  is  performed  before  the  data 
item  is  transferred. 

• In  addition  to  resolving  differences  between  the  schema  and  sub-schema 
descriptions,  any  of  the  following  clauses  included  in  the  schema  description  of 
a data  item  are  handled  during  data  record  mapping 

RESULT  CLAUSE:  The  data  base  procedure  is  executed  to  determine 

the  actual  or  virtual  value  of  the  data  item. 

9 
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ENCODING  OR  DECODING  CLAUSE:  The  specified  data  base  pro- 

cedure is  executed  to  perform  a non-standard  conversion  for  the  data 
item  for  security  encoding. 

CHECK  CLAUSE:  The  data  item  is  checked  for  the  restrictions 

imposed  by  this  clause. 

CALL  CLAUSE:  The  data  base  procedure  is  executed  when  the 

specified  function  is  performed  on  the  data  item. 

• Certain  attributes  of  the  schema  areas  are  specified  in  the  DATA  CONTROL 
ENTRY.  This  entry  also  designates  the  system  or  user  library  containing  all 
the  data  base  procedures  specified  in  the  schema. 

• An  AREA  CONTROL  ENTRY  supplies  special  information  related  to  an  area 
of  the  schema.  The  following  information  can  be  specified  in  the  entry: 

The  data  items  representing  the  primary  and  alternate  keys. 

The  collating  sequence. 

Logging  requirements. 

The  means  for  distringuishing  betwen  multiple  record  types  within  the 
area. 

• The  RELATION  ENTRY  of  the  DMS  170  schema  represents  the  greatest 
divergence  of  DMS  170  from  the  CODASYL  DDL  specifications. 

• A Relation  defines  a directed  path  joining  areas  described  in  the  schema.  An 
area  can  be  joined  only  once  in  any  one  relation.  The  relation  entry  specifies 
the  areas  to  be  joined,  allocates  a name  to  the  relation,  and  specifies  the  data 
items  to  be  used  as  joined  terms.  The  maximum  number  of  relations  allowed 
in  a data  base  is  4,095. 
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Each  relation  entry  begins  with  a RELATION  NAME  statement.  This  is 
followed  by  a series  of  statements  pairing  the  data  item  JOIN  term  of  one 
area  and  the  corresponding  data  item  in  the  area  joined  to  it. 

The  JOIN  clause  specifies  the  data  items  that  CDCS  must  inspect  to  join  the 
areas  in  which  the  data  items  reside.  The  order  in  which  the  data  items  are 
specified  determines  the  direction  of  the  relationship. 

Within  the  JOIN  clause  a specified  data  item  in  one  area  is  equated 
with  an  identical  data  item  in  another  area;  the  two  areas  are  thus 
related  through  specification  of  a common  data  item. 

The  relational  operator  EQ  must  appear  between  each  pair  of  data 
items  included  in  the  JOIN  clause. 

While  the  DMS  170  Relation  Entry  enables  related  areas  to  be  joined  together, 
these  relationships  are  restricted  to  relationships  between  records  cross  areas, 
rather  than  relationships  between  records  within  areas.  In  addition,  the 
relation  operator  EQ  is  the  only  permitted  operator  and  is  specified  in  the 
schema. 

No  capability  is  provided  in  the  DMS  170  COBOL  program  to  specify 
different  relational  operators  (such  as  GT,  LE,  GE  or  LT).  (However, 
refer  to  RESTRICT  clause  below,  in  the  sub-schema.) 

f.  Sub-Schema  Statements 


• Within  the  constraints  imposed  by  the  lack  of  SET  Relationships  in  DMS  170 
and  the  lack  of  PRIVACY  LOCKS,  the  DMS  170  Sub-Schema  provides  a sub-set 
of  the  CODASYL  sub-Schema  specifications. 

• The  main  distinction  occurs  in  the  capability  to  convert  between  data  items  as 
specified  in  the  schema,  and  data  items  specified  in  the  sub-schema,  through 
the  use  of  the  PICTURE  or  TYPE  clauses. 
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• The  ability  to  RENAME  data  items  between  the  schema  and  sub-schema  is 
provided. 

• An  additional  capability  is  the  means  of  defining  88  condition-name  clauses  in 
the  sub-schema  relating  the  VALUES  of  those  conditions  through  a range  of 
literals. 

• The  most  significant  difference  between  the  DMS  170  sub-schema  and  the 
CODASYL  sub-schema  however,  is  in  the  use  of  the  RELATION  DIVISION 
rather  than  the  CODASYL  SET  DIVISION. 

• The  DMS  170  Relation  Division  enables  Relation  names  to  be  specified  and  a 
RESTRICT  clause  to  specify  the  record  qualification  criteria  that  must  be 
satisfied  by  a record  occurrence  that  is  to  be  returned  to  the  COBOL 
Application  program  work  area. 

The  RESTRICT  clause  enables  a number  of  relational  operators  to  be 
specified  between  certain  identifiers  in  the  program  and  identifiers  or 
data  items  in  the  record. 

Only  one  RESTRICT  clause  can  be  included  for  a given  record.  A 
maximum  of  1,024  entities  (identifiers,  operators,  literals  or  data- 
names)  can  appear  in  RESTRICT  clauses  for  any  one  relation. 

• Boolean  relations  are  supported  between  the  various  qualifying  criteris. 
However,  it  should  be  recognized  that  these  relations  are  specified  in  the  sub- 
schema and  therefore  are  less  flexible  than  relations  which  can  be  established 
by  the  particular  identifiers  defined  by  the  sub-schema  and  established  by  the 
application  program. 

• It  can  be  argued  therefore  that  this  capability  does  give  the  Data  Base 
Administrator  an  additional  control  over  the  ability  of  an  application  program 
to  extract  related  records  from  the  data  base.  However,  it  does  significantly 
limit  the  flexibility  of  DMS  170  COBOL  application  programs. 
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Security 


A significant  limitation  in  the  data  base  support  offered  by  DMS  170  is  the 
ommission  of  CODASYL  PRIVACY  LOCK  clauses  in  the  schema  and  the  sub- 
schema. This  represents  a potential  security  exposure  in  a DMS  170 
environment. 


While  the  sub-schema  capability  of  DMS  170  does  enable  the  data  base 
administrator  to  limit  the  parts  of  the  data  base  accessible  by  an  application 
program  to  only  that  sub-set  of  the  data  base  authorized  for  the  program  to 
access,  the  ommission  of  PRIVACY  LOCK  clauses  makes  all  of  the  records  and 
data  items  specified  in  the  sub-schema  accessible  to  the  application  program 
which  is  able  to  carry  out  any  retrieval,  modification,  addition  or  deletion  of 
information  in  those  records. 


A further  potential  security  exposure  is  the  ability  to  bypass  the  CYBER  Data 
Base  Control  system  completely  and  instead  access  the  data  base  through  the 
CYBER  Record  Manager. 


BASIC  FUNCTIONAL  CAPABILITIES 


a.  Easy  Accessibility 

DMS  170  provides  a rather  complete  inplementation  of  the  CODASYL  Data 
Definition  Language  (DDL),  both  for  the  specification  of  schemas  and  sub- 
schemas. Elowever,  DMS  I 70  does  not  support  the  CODASYL  Data  Manipu- 
lation Language  (DML). 

The  programming  languages  supported  by  DMS  170  include  COMPASS  (The 
Control  Data  Assembler  language),  COBOL  for  commercial  use,  and  FOR- 
TRAN or  ALGOL  for  scientific  use. 


An  end  user  language  capability  is  provided  through  the  use  of  QUERY 
UPDATE. 
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Data  Communications  support  is  provided  through  the  appropriate  Operating 
System  (NOS/BE  or  NOS)  facilities. 

b.  Multiple  Views  of  Data 

DMS  170,  through  its  direct  interface  with  the  CYBER  Record  Manager,  is 
able  to  support  sequential,  random,  index  and  multiple  index  file  organizations. 

c.  Data  Consolidation 

The  use  of  a Relation  entry  in  DMS  170  rather  than  the  Set  relationship  entry 
of  CODASYL  represents  the  greatest  difference  between  DMS  170  and  the 
CODASYL  Data  Base  specifications. 

No  effective  restriction  is  placed  on  the  number  of  repeating  group  types  per 
data  entity  (data  base  record)  or  occurrences  per  repeating  group  type.  The 
maximum  number  of  record  types  supported  in  one  area  is  116,  and  the 
maximum  number  of  areas  allowed  per  data  base  is  63. 

Variable  length  occurrences  are  supported  through  the  use  of  the  OCCURS 
TIMES  clause  where  a variable  number  of  occurrences  may  be  specified  by  the 
content  of  a field  or  program  identifier. 

The  maximum  number  of  nested  levels  supported  by  DMS  170  is  limited  to  the 
maximum  number  of  relations  allowed  - 4,095. 

As  relationships  can  only  be  established  across  areas,  the  maximum  number  of 
entity  relationships  per  data  base  record  is  apparently  limited  by  the  maximum 
number  of  areas  supported  per  data  base  - 63.  This  also  appears  to  be  the 
limit  of  Ithe  number  of  entity  relationships  per  program. 
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3.  DATA  INDEPENDENCE 

a.  Levels  of  Mapping 

• DMS  170  does  not  support  the  Set  relationship  entry  of  CODASYL.  Accord- 
ingly, it  is  limited  in  its  ability  to  change  (through  the  use  of  CODASYL  SET 
OCCURRENCE  SELECTION  specifications)  the  mapping  between  the  schema 
and  sub-schema.  Consequently,  DMS  170  is  unable  to  provide  a selective 
support  for  the  conceptual  level  of  mapping. 

• DMS  170  permits  field  level  format  translation  to  occur  between  the  schema 
and  sub-schema  where  elements  required  by  an  application  program  are 
specified  in  the  sub-schema.  This  provides  a high  level  of  data  independence 
between  the  format  of  data  as  stored  on  the  data  base  and  the  format  of  data 
as  required  by  the  application. 

• However,  no  provision  is  made  to  specify  data  elements  in  the  application 
program  in  a sequence  different  from  that  sequence  specified  in  the  schema. 
Thus,  the  full  capability  of  field  level  data  independence  is  not  available  to 
DMS  170. 

b.  Data  Base  Changes 

• A Change  of  Device  Type  can  be  accommodated  through  the  use  of  utilities  to 
reload  the  data  base.  No  logic  changes  or  recompilation  is  necessary. 

• A Change  of  Access  Method  can  be  achieved  through  the  CYBER  Record 
Manager,  with  no  change  in  program  logic  or  program  recompilation.  All  that 
is  required  is  a reload  of  the  data  base. 

• Similarly,  a Change  in  Entity  View  can  be  accommodated  just  by  a data  base 
reload. 

• The  Addition  of  a New  Entity  (data  base  record)  can  be  achieved  without 
changes  in  program  logic,  program  recompilation  or  a need  to  reload  the  data 
base. 
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• The  Addition  of  a New  Repeating  Group  Type  - that  is,  a new  data  base  record 
will  require  program  logic  changes  and  recompilation  because  of  the  approach 
with  DMS  170  DML,  which  is  not  a standard  CODASYL  DML.  Add  and  delete 
programs  which  may  need  to  be  aware  of  the  various  repeating  group  types  and 
relationships  in  a data  base  will  need  to  be  changed  to  reflect  the  new 
repeating  group  type  as  these  programs  must  identify  each  record  explicity  as 
well  as  records  joined  through  relations.  The  data  base  will  of  course  have  to 
be  reloaded. 

• The  Addition  of  a New  Relationship  is  achieved  by  specifying  a new  RELA- 
TION ENTRY  in  the  schema  and  identifying  the  appropriate  records  in  the  sub- 
schema. This  new  relationship  is  not  transparent  to  the  application  program 
but  must  be  explicitly  referenced  in  the  program  - rather  than  be  applied 
automatically  by  the  DBMS  as  with  other  data  base  systems. 

• The  Addition  of  a New  Field  to  a repeating  group  (record)  can  be  achieved 
without  logic  change  but  with  program  recompilation  and  data  base  reload 
because  of  the  field  level  definition  supported  in  the  sub-schema  of  DMS  170. 

• A Change  of  Field  Format  can  be  readily  achieved  in  DMS  170  without 
changed  program  logic  or  program  recompilation  because  of  the  conversion  of 
field  formats  between  that  specified  in  the  schema  and  that  specified  in  the 
sub-schema.  This  happens  automatically  such  that  the  program  is  always 
presented  with  fields  of  the  formats  specified  in  the  sub-schema  regardless  of 
how  those  fields  may  be  defined  in  the  schema  and  physically  present  on  the 
data  base. 

4.  DATA  INTEGRITY 

a.  Exclusive  Control 

• The  DMS  170  documentation  is  not  very  explicit  in  describing  the  approach 
taken  for  exclusive  control  - in  particular  the  lowest  level  at  which  exclusive 
control  is  applied. 
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• It  appears  from  the  structure  of  DMS  170  that  the  lowest  isolated  level  would 
be  at  the  record  level.  This  record  level  is  not  isolated  through  exclusive 
control  mechanisms  which  will  detect  dead-locks.  Instead,  the  programmer  is 
responsible  for  establishing  exclusive  control  to  avoid  dead-locks  and  maintain 
isolation. 

5.  RECOVERY/RESTART 

a.  Recovery 

• DMS  170  automatically  logs  after-images  for  use  in  data  base  reconstruction 
recovery. 

• Utility  support  is  provided  to  produce  a data  base  copy  for  back-up  purposes 
and  subsequently  restore  that  back-up  copy  of  the  data  base  to  disk.  It 
appears  that  a Log  Summarization  utility  is  provided  to  ensure  that  only  the 
most  recent  activity  against  a record  is  recorded  on  the  log  tape  and  so 
optimize  recovery. 

• A utility  is  provided  to  apply  the  log  tape  to  the  restored  back-up  copy  to 
reconstruct  the  data  base  copy  in  event  of  unrecoverable  I/O  errors. 

• The  smallest  recoverable  unit  able  to  be  reconstructed  by  the  DMS  170 
recovery  utilities  is  an  area  which  can  comprise  one  or  more  data  sets. 

b.  Batch  Restart 


• DMS  170  automatically  logs  before  images  during  batch  program  execution  for 
use  in  backing  out  partially  completed  program  activity  in  the  event  of  a 
system  or  program  failure.  Utility  support  is  provided  to  close  the  log  tape 
and  write  a tape  mark,  in  the  event  of  a system  failure  preventing  this  during 
the  failure  situation. 
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• A backout  utility  is  provided  to  process  the  log  tape  and  remove  all  activity 
which  resulted  from  the  execution  of  partially-completed  batch  programs. 

• This  backout  can  proceed  either  to  the  start  of  program  or  alternatively  to 
immediate  restart  points;  checkpoints  can  be  defined  during  execution  by  the 
application  program. 

c.  On-Line  Restart 

• Message  logging  is  automatically  carried  out  by  the  DMS  170  system  and 
appropriate  operating  system  to  ensure  that  all  on-line  messages  are  logged 
for  subsequent  recovery.  This  logging  activity  is  directed  to  the  same  log 
device  as  data  base  activity.  Consequently  it  appears  that  log  synchronization 
of  data  base  and  message  activity  is  provided  by  for  DMS  170. 

• While  an  on-line  backout  utility  is  provided  to  backout  partially-processed  on- 
line transactions  and  restart  the  on-line  system  at  its  status  at  some  point 
prior  to  the  failure  situation,  it  appears  from  the  documentation  that  no 
provision  is  made  by  the  DMS  170  for  the  restart  of  partially-processed  tasks 
which  may  have  been  backed  out. 

• It  appears  that  it  is  a programmer  responsibility  to  identity  the  tasks  which 
were  backed  out  and  then  reprocess  those  tasks  (transactions)  if  required  by 
the  particular  application. 

6.  DATA  SECURITY 

• The  security  mechanism  implemented  by  DMS  170  is  a password  mechanism 
which  is  applied  at  the  Repeating  Group  Type  (record)  level.  The  access 
options  which  can  be  specified  in  the  sub-schema  by  the  data  base  admini- 
strator enable  control  of  access  for  read,  update,  add,  or  delete,  or  exclusive 
control  to  be  applied. 
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• The  password  security  protection  implemented  by  DMS  170  requires  the 
application  program  to  specify  the  appropriate  passwords  defined  in  the  sub- 
schema before  access  is  permitted  to  the  revelant  data.  Thus,  with  the 
appropriate  sub-schema  which  defines  that  part  of  the  data  base  of  interest 
and  a knowledge  of  the  appropriate  passwords,  an  application  program  is  given 
access  to  the  data  for  the  access  levels  specified  in  the  sub-schema. 

7.  EASE  OF  USE 

a.  Data  Base  Administrator 


• The  DMS  170  implementation  of  the  CODASYL  Data  Base  Specifications  is  a 
very  comprehensive  implementation  at  the  schema  level.  The  Data  Definition 
Language  type  is  the  procedural  CODASYL  language  with  most  of  the 
CODASYL  DDL  specifications  implemented.  For  example,  the  RESULT  is 
ACTUAL/VIRTUAL  clauses  are  implemented  such  that  data  stored  on  the  data 
base  can  be  used  to  calculate  other  data.  That  calculated  data  is  then 
presented  to  the  application  program  as  if  was  actually  present  on  the  data 
base  itself. 

• Another  example  of  the  DMS  170  schema  sophistication  is  the  implementation 
of  the  CHECK  clause  which  enables  the  format  of  fields  (numeric,  alpha- 
numeric, etc.),  to  be  checked  for  conformity  with  the  specified  field  format 
and  the  content  of  those  fields  to  be  validated  perhaps  by  checking  that  it  falls 
within  defined  ranges. 

• An  area  where  DMS  170  does  not  support  the  CODASYL  DDL  specifications  is 
in  the  omission  of  the  SET  ENTRY.  Instead,  the  DMS  170  RELATION  ENTRY 
is  used.  This  is  a far  more  limited  capability  which  in  effect  only  implements 
the  owner  and  member  clause  capability  of  the  SET  ENTRY.  Two  records  are 
defined  by  name  and  are  then  specified  as  bein  JOINED  together  in  the 
Relation  Entry. 
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The  additional  Set  Entry  specifications  for  sef  occurrence  selection  and  for 
ordering  of  member  records  in  a set  relationship  or  for  security  of  access  to 
that  set  relationship  through  privacy  locks  are  not  supported  by  DMS  170 
Relation  Entry. 

Data  base  design  aids  and  performance  measuring  aids  are  provided  by  DMS 
170  through  the  use  of  additional  utility  programs.  It  is  not  known  whether 
provision  is  made  for  a Data  Dictionary  to  assist  in  documentation  and  control. 
However,  utilities  are  provided  to  assist  in  data  base  restructuring. 

The  ease  of  conversion  from  other  data  files  to  DMS  170  is  achieved  through 
the  way  in  which  DMS  170  is  integrated  with  the  CYBER  Record  Manager. 
This  enables  any  files  supported  by  the  CYBER  Record  Manager  to  be  defined 
as  a DMS  170  record  and  incorporated  in  the  data  base. 

Both  externals  and  internals  education  of  DMS  170  is  provided  by  Control 
Data.  The  documentation  is  easy  to  follow  with  user  guide  examples  and 
reference  material. 

b.  Application  Programmer 

The  Data  Manipulation  Language  supported  by  DMS  170  is  not  a standard 
CODASYL  DML.  Instead,  the  DMS  170  DML  is  closer  to  standard  COBOL, 
with  the  standard  COBOL  verbs  being  interpreted  by  DMS  170  in  terms  of  the 
necessary  accesses  to  the  data  base. 

DMS  170  is  easy  to  use  by  existing  COBOL  programmers  as  it  does  not  require 
the  additional  CODASYL  DML  commands  to  be  understood.  However,  this 
ease  of  use  does  prevent  the  DMS  170  programmer  from  gaining  the  full  ease 
of  use  benefit  of  the  CODASYL  DML.  Data  base  operations  which  would 
otherwise  be  carried  out  by  CODASYL  DML  commands  must  be  explicity 
programmed  in  COBOL  by  the  DMS  170  application  programmer. 
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• Because  of  the  above  comments  the  number  of  operations  supported  by  DMS 
170  DML  is  not  revelant  in  terms  of  the  full  CODASYL  DML. 

• As  with  COBOL,  only  one  record  type  can  be  operated  on  per  command. 

• While  provision  is  made  for  data  searching  - permitting  high,  low  and  equal 
comparisons  to  be  made  and  comparisons  to  be  combined  by  means  of  Boolean 
logic  (and,  or,  not)  - these  data  searches  are  specified  in  the  sub-schema 
RESTRICT  clause. 

• The  data  search  comparisons  can  only  apply  to  multiple  record  types  and 
restrict  the  application  program  to  search  the  data  base  using  search  strat- 
egies defined  in  the  sub-schema.  While  this  is  a non-standard  CODASYL  DML 
approach,  it  is  quite  good  as  it  enables  quite  complex  search  strategies  to  be 
defined  by  the  Data  Base  Administrator  in  the  sub-schema  and  allows  those 
search  strategies  to  be  transparent  to  the  sub-schema  and  to  the  application 
programmer. 

The  application  programmer  need  only  specify  the  appropriate  records 
to  be  read  and  set  up  the  appropriate  search  arguments  in  fields  defined 
by  the  Data  Base  Administrator  to  the  application  program. 

• These  search  arguments  are  then  applied  by  DMS  170  using  the  specifications 
and  search  strategy  outlined  in  the  RESTRICT  clause.  This  approach,  together 
with  the  Relation  Entry  in  the  schema,  enables  the  Data  Base  Administrator  to 
have  quite  complete  control  over  the  way  in  which  records  are  interrelated 
and  accessed. 

• While  the  application  programmer  is  limited  in  the  data  base  capability 
through  the  implementation  of  the  non-CODASYL  standard  DML,  the  imple- 
mentation using  the  schema  RELATION  and  sub-schema  RESTRICT  clauses 
does  appear  to  offer  the  DMS  170  programmer  a great  deal  of  data  indepen- 
dence. 
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• Changes  in  the  physical  structure  of  the  data  base  are  permitted  to  be  made 
by  the  Data  Base  Adminstrator  without  requiring  considerable  program  modifi- 
cation as  may  be  the  case  with  some  implementations  with  the  full  CODASYL 
DML. 

c.  End  User 

• DMS  170  supports  QUERY  UPDATE  which  is  a high-level  English  language  end 
user  tool.  It  can  be  used  in  both  on-line  and  batch  modes,  and  supports 
retrieve,  update,  add  and  delete  operations  against  the  data  base. 

8.  COST/PERFORMANCE 

a.  Measurable  Costs 

• The  cost  of  DMS  170  and  the  memory  requirements  were  not  available  for  this 
evaluation. 

b.  Performance  Constraints 

• In  the  DMS  170  architecture,  multithread  processing  is  supported  for  best 
performance. 

• A common  pool  is  used  for  input/output  buffer  management  to  permit 
optimum  use  to  be  made  of  available  storage. 

• A maximum  of  I 16  different  repeating  group  types  (records)  may  be  defined 
per  data  set.  Ways  in  which  related  records  in  a data  set  could  be  grouped  for 
optimum  performance  were  not  specified  in  the  documentation. 

• A data  base  record  made  up  of  many  related  repeating  group  types  (records) 
must  be  constructed  only  using  records  from  the  one  data  set.  A RELATION 
ENTRY  in  the  schema  joining  related  records  cannot  be  applied  across  data 
sets.  Instead,  only  records  within  the  same  data  set  may  be  joined.  This  does 
limit  the  ability  to  interrelate  separate  data  sets  and  implies  a data  base 
design  which  incorporates  all  related  records  in  the  one  data  set  (file). 
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IV  D.  DMS  1100  (Univac) 


D.  DMS  1 100  (UNI VAC) 


|.  GENERAL  DESCRIPTION 

• DMS  I 100  is  the  data  base  system  supported  by  Univac  for  their  I 100  Series 
Systems.  DMS  I 100  is  based  on  the  Codasyl  Data  Base  Task  Group  specifica- 
tions published  in  1969. 

• Univac's  plans  for  DMS  I 100  reportedly  include  the  addition  of  the  Sub  schema 
capabilities,  pointer  erase,  and  indexes. 

• The  basic  elements  of  DMS  I 100  are  the  Data  Management  Routine  (DMR), 
Data  Definition  Language  (DDL),  and  Data  Manipulation  Language  (DML). 
Some  utilities,  initializing  routines  and  system  generation  facilities,  are  also 

available. 

a.  Data  Management  Routine 

. The  Data  Management  Routine  (DMR)  is  the  Data  Base  Management  control 
module  for  DMS  MOO.  All  run-units  (application  programs)  interface  to  the 
data  base  through  this  module.  Up  to  64  run-units  can  be  active. 

All  data  manipulation  and  commands  are  executed  against  the  data  base 
by  the  Data  Management  Routine. 

DMR  controls  run-unit  rollback  and  data  base  recovery. 

DMR  is  written  in  MOO  Series  ASSEMBLER  Language,  and  the  code  is 
re-entrant  to  allow  both  multi-programming  and  multi-processing  oper- 
ations. 
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DMR  does  not  support  telecommunications  directly,  but  DMS  I 100 
supports  an  interface  to  the  Transaction  Interface  Processor  (TIP). 

b.  Data  Definition  Language  (DDL) 

The  Data  Definition  Language  (DDL)  has  been  implemented  in  DMS  I 100  as  a 
"standalone"  language.  Its  record  descriptions,  however,  are  COBOL  compat- 
ible. 

The  Data  Definition  Language  enables  the  Data  Base  Administrator  to 
describe  the  data  and  its  structures  to  the  system  as  a schema. 

The  schema  is  processed  by  the  Data  Definition  Language  Translator 
program  and  stored  as  a series  of  interpreted  tables  under  its  own 
unique  name  in  the  Schema  File,  where  it  is  then  available  for 
application  program  reference. 

The  schema  is  the  complete  description  of  the  data  base,  both  its 
physical  and  logical  structures.  DMS  I 100  requires  that  all  data 
relationships  be  defined  together  in  the  one  schema  - that  is,  it  is  not 
possible  to  establish  data  relationships  that  extend  beyond  the  bounds  of 
the  single  schema. 

There  can  be  multiple  schemas  defined  but  no  logical  relationship  between 
schemas.  Thus,  Data  Definition  could  be  as  simple  as  one  schema  per 
application  or  alternatively  one  schema  definition  for  all  of  the  data  of  an 
entire  company. 

The  schema  is  coded  in  terms  of  areas,  records  and  sets. 
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c. 


Data  Base  Areas 


Under  DMS  I 100,  an  area  is  used  to  describe  a data  base  in  terms  of  its  mass 
storage  requirements  - "an  I 100  file  assignment  to  random  access  mass 
storage."  A schema  can  encompass  I to  4,000  areas. 

Each  area  is  subdivided  into  pages;  each  page  is  further  subdivided  into 
records.  Pages  are  described  to  the  system  as  the  number  of  words  per 
physical  block,  and  page  size  may  vary  from  one  area  to  another. 

Each  area  of  the  data  base  must  be  described  by  the  Data  Base  Administrator 
in  a separate  area  entry.  Each  area  is  assigned  an  area  name.  In  addition,  an 
AREA  CODE  clause  defines  a four  digit  code  used  by  the  system  to  assure 
uniqueness. 

The  ALLOCATE  clause  provides  specification  of  the  number  of  pages  in  the 
area  and,  optionally,  for  embedded  overflow. 

The  PAGE  clause  contains  the  number  of  words  per  page.  Only  one  page  size 
may  be  specified  within  a single  area;  however,  page  size  may  vary  from  area 
to  area. 

Optional  clauses  include  types  of  LOOKS  taken  for  area  recovery,  an  initial 
LOAD  factor,  and  a CALC  clause  to  help  the  system  resolve  conflicts  when 
records  of  different  LOCATION  MODE  exist  in  the  same  area. 

d.  Data  Base  Records 


The  term  record  in  DMS  I 100  is  used  as  it  is  in  COBOL  to  describe  a group  of 
data  items  (or  fields).  Record  descriptions  include  information  about  the 
content  of  the  record  (data,  item  and  usage  clauses)  as  well  as  physical 
placement  of  the  record. 
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• Each  record  type  in  the  schema  must  be  described  in  a RECORD  entry.  As  in 
the  AREA  entry,  each  record  is  assigned  both  with  a NAME  and  a CODE. 

• Location  Mode  is  assigned  at  the  record  level  to  describe  the  access 
techniques  for  that  record.  Three  modes  are  possible;  DIRECT  (by  data  base 
key,  CALC  (via  a randomizing  procedure),  or  VIA  SET  (record  participation  in 
a logical  relationship).  A fourth  mode  of  INDEX  has  been  added  in  DMS  1 100. 

• Since  the  data  base  key  is  a basic  search  argument  of  all  records  in  the 
system,  its  definition  is  important.  The  data  base  key  is  the  address  assigned 
each  and  every  record  in  the  schema  and  is  made  up  of  the  area  name  and  area 
key. 

• Area  key  is  further  defined  in  terms  of  page  number  and  record  number. 

• Data  base  keys  are  handled  by  DMS  I 100  at  two  levels;  externally  to  the  user 
program  in  symbolic  format  and  internally  to  the  system  in  actual  notation. 

• The  WITHIN  clause  can  be  used  to  define  area  and  page  limits  within  which  the 
record  occurrence  is  placed. 

• A RESERVE  clause  must  be  specified  for  records  that  are  manual  members  of 
a set  to  define  the  number  of  pointer  locations  within  a record. 

• Record  contents  are  described  by  typical  COBOL  PICTURE,  USAGE  and 
OCCURS  clauses. 

• LOCATION  MODE  is  an  important  specification  for  a number  of  reasons. 
Although  the  application  programmer  need  not  be  aware  of  the  LOCA1ION 
MODE  of  the  record  to  be  accessed,  the  Data  3ase  Administrator  must  provide 
enough  information  for  the  programmer  to  know  what  work  areas  are  to  be 
initialized  prior  to  issuing  a DMS  I 100  command. 
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The  DMS  I 10  Data  Base  Administrator  must  develop  for  the  system  a physical 
"record  placement  strategy."  Several  area  and  record  entries  play  a part  in 
the  strategy. 

The  Data  Base  Administrator  must  understand  and  be  able  to  balance 
such  area  entries  as  ALLOCATE,  PAGES,  LOAD  and  CALC  with  record 
entries  (LOCATION  MODE)  to  develop  an  optimum  physical  location  of 
records. 

However,  it  is  understood  that  DMS  I 100  does  not  currently  offer  a 
reorganization  utility.  As  data  reguirements  change,  initial  strategy 
may  become  obsolete.  Without  reorganization  utilities,  the  only  way  to 
change  the  record  placement  strategy  is  to  reload  the  entire  data  base. 

e.  Data  Base  Sets 


The  AREA  and  RECORD  ENTRY  portions  of  the  schema  deal  with  the  physical 
aspects  of  the  data,  while  the  Set  entry  describes  the  logical  relationships 
between  records. 


• A Set  is  a relationship  between  a single  Owner  record  type  and  one  or  more 
Member  record  types. 

• A record  type  may  participate  in  more  than  one  Set,  either  as  an  Owner  or  as 
a Member,  but  cannot  participate  in  a single  Set  as  both  Owner  and  Member. 

Each  Owner-Member  Set  must  be  defined  in  a SET  entry. 

Each  SET  entry  contains  a basic  description  of  the  Owner  and  Member 
and  a SET  OCCURENCE  SELECTION  clause  that  defines  the  data  path 
the  system  must  use  to  retrieve  a requested  record  based  on  set 
participation. 

c 
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• The  SET  entry  begins  with  SET  NAME  and  SET  CODE  clause,  as  in  the  other 
entry  selections.  The  SET  NAME  selected  must  be  unique  from  other  set 
names  in  the  schema,  and  must  not  exceed  30  characters.  SET  CODE  is  an 
integer  between  I and  4,000  and  is  used  as  an  internal  substitute  for  SET 
NAME. 

• The  mode  clause  must  be  specified  as  CHAIN  (POINTER  ARRAY  mode  has 
been  implemented  in  level  5 of  DMS  I 100).  Each  record  occurrence  beginning 
with  the  owner  is  chained  to  the  next  record  occurrence,  and  the  last  record  is 
chained  back  to  the  owner.  Optionally,  each  record  can  be  "Linked  Prior" 
through  the  ORDER  IS  PRIOR  clause  (backward  as  well  as  forward  pointer  in 
each  record). 

• The  ORDER  clause  must  be  included  to  define  the  order  in  which  member 
record  occurrences  are  to  be  inserted  within  a set  occurrence.  Order  may  be 
specified  as  FIRST,  LAST,  NEXT,  PRIOR  and  SORTED. 

• The  FIRST  and  LAST  options  refer  to  the  position  of  the  members  relative  to 
the  owner.  FIRST  places  an  inserted  record  just  after  the  owner  record;  LAST 
places  it  at  the  end  of  all  previous  member  occurrences  for  a given  owner 
occurrence. 

• SORTED  specifies  the  members  that  are  to  be  placed  in  ascending  or 
descending  key  sequences  specified  in  the  MEMBER  clause.  NEXT  and  PRIOR 
require  an  explanation  of  the  CODASYL  concept  known  as  Currency. 

• The  last  record  stored  or  retrieved  by  a run-unit  (application  program)  is 
logged  by  the  system  as  the  "current  record"  of: 

The  run-unit. 

Its  record  name. 

Its  area  name. 
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All  set  types  in  which  the  record  appears. 


NEXT  and  PRIOR  refer  to  a record  insertion  point  just  after  or  just  before  the 
"current  record,"  which  current  record  is  determined  by  the  format  of  the 
command  issued. 

Currency  can  only  be  established  by  the  FIND/FETCH  or  STORE  commands. 
Initial  record  retrieval  to  establish  currency  must  be  made  on  another  basis, 
either  record  Location  Mode,  or  Set  Participation. 

The  programmer  must  manipulate  currency  indicators  with  care.  F or  example, 
he  must  be  sure  to  establish  the  appropriate  currency  indicator  before  using  it 
as  a reference  point  for  data  base  insertion. 

Currency  indicators  are  useful  particularly  if  the  data  base  is  to  be  accessed  in 
a sequential  fashion.  However,  the  programmer  must  thoroughly  understand 
the  data  base  structure. 

The  OWNER  clause  defines  the  name  of  the  owner  record  type  of  the  named 
set  type.  There  may  be  one  or  more  MEMBER  clauses  each  containing  the 
name  of  the  member  record  type  within  the  set.  Each  MEMBER  clause  must 
specify  whether  the  record  is  an  AUTOMATIC  or  MANUAL  member. 

When  a record  is  STORED,  an  occurrence  of  all  sets  in  which  the  record  is 
defined  as  an  Owner  is  established.  A STORE  also  creates  an  occurrence  of  a 
record  in  any  set  in  which  it  participates  as  an  AUTOMATIC  member. 
However,  a record  must  be  INSERTED  into  each  and  every  set  occurrence  in 
which  it  is  defined  as  a MANUAL  member. 

Other  specifications  for  a member  record  include  the  ability  to  LINK  each 
member  record  to  its  owner,  selecting  ASCENDING  or  DESCENDING  sequence 
when  set  order  is  SORTED,  and  specifying  automatic  handling  of  DUPLICATE 
records. 


- 157  - 


© 1978  by  INPUT,  Menlo  Park,  CA  94025.  Reproduction  Prohibited. 


INPUT 


• Finally,  the  Data  Base  Administrator  must  provide  a map  of  the  system  to 
follow  when  retrieving  or  inserting  a record  in  its  proper  logical  sequence  with 
a set.  The  system  must  be  able  to  find: 

The  set  type  of  which  the  record  is  a member. 

The  specific  set  occurrence  where  the  record  is  located. 

The  Location  Mode  of  the  record  involved  in  the  search  or  the  current 

record  of  the  object  set. 

• A record  is  positioned  as  a participant  of  a set  type  in  the  Set  entry  OWNER 
or  MEMBER  clauses  and  Location  Mode  of  the  record  is  established  in  the 
LOCATION  MODE  clause  of  the  record  entry. 

• The  rules  for  selection  of  a particular  occurrence  of  a record  as  a member  of  a 
set  are  defined  in  the  SET  OCCURRENCE  SELECTION  clause.  Basically  the 
selection  can  be  defined  one  of  two  ways:  either  through  the  record  that  is 
CURRENT  OF  SET  (implying  a previous  selection  to  establish  the  currency  of 
that  record)  or  through  the  Location  Mode  of  the  owner  record. 

• SET  OCCURRENCE  SELECTION  through  the  Location  Mode  of  the  owner 
record  (DIRECT,  CALC,  or  VIA  SET)  indicates  an  additional  effort  to  initialize 
fields  prior  to  accessing  the  object  record. 

• The  problems  the  Data  Base  Administrator  faces  in  selecting  combinations  of 
data  base  structures  set  selection  criteria  and  record  location  modes  that 
satisfy  all  uses  of  the  data  is  a complex  one.  They  involve  balancing  the 
physical  placement  and  retrieval  of  data  within  its  logical  uses. 
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f. 


Data  Manipulation  Language  (DML) 


The  Data  Manipulation  Language  is  implemented  as  an  extension  to  the  Univac 
MOO  Series  ANS  COBOL  Language,  and  supports  either  field  data  or  ASCII 
data  formats.  DMA  | 1 00  additions  include: 

Data  base  description  sections  in  the  Data  Division. 

Additional  data  handling  verbs  in  the  Procedure  Division. 

Although  these  additions  are  based  on  the  CODASYL  Data  Base  Task  Group 
Report  specifications,  some  of  the  CODASYL  recommendations  have  not  been 
implemented  such  as  PRIVACY,  LOCKS  and  Sub  Schema.  DMS  I 100  has 
additional  functions  found  in  CODASYL  (IMPART  and  DEPART  commands). 


g.  Data  Division  Specifications 

Additions  to  the  Data  Division  include  a Schema  Section  and  additional  usage 
statements  in  the  Common  Storage  and  Working  Storage  Sections. 


The  Schema  Section  describes  the  schema  to  be  referenced  or  invoked  by  the 
run-unit.  The  programmer  can  reference  one  entire  schema  or  selected 
records  (by  record  name)  in  one  schema.  Addition  specifications  in  a Schema 
Section  include: 

RUN-UNIT-ID 

PRIORITY  (to  establish  priority  for  a run-unit  when  the  system  has  to 
resolve  data  base  deadlocks  or  activate  queued  run-units) 


RECORD  DELIVERY-AREA  descriptions. 


OVERLAY  (must  be  specified  for  records  which  can  be  overlayed  by  the 
system). 
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COMMON  area  definition  (for  communication  between  the  Data  Man- 
agement Routine  and  the  run-unit). 

Specifications  for  error  recovery  techniques  and  procedures, 
h.  Procedure  Division  Specifications 

• Sixteen  verbs  have  been  added  to  I 100  Series  COBOL  to  support  DMS  I 100 
data  handling  functions. 

IMPACT:  This  attaches  the  issuing  run-unit  to  the  Data  Management 

Routine  and  establishes  a run-unit  interface  to  the  data  requested  in 
the  Schema  Section  of  the  Data  Division. 

DEPART:  This  terminates  the  run-unit  and  detaches  it  from  the  Data 
Management  Routine. 

OPEN:  There  are  two  forms  of  the  OPEN  verb,  one  to  open  all  areas  of 
the  invoked  schema  and  another  to  open  selected  areas  of  that  schema. 
USAGE-MODE  may  then  be  specified  at  the  schema  or  area  level.  A 
USAGE-MODE  of  EXCLUSIVE  or  INITIAL  LOAD  prevents  all  access  to 
the  specified  areas  of  schema.  PROTECTED  usage  mode  provides 
access  to  the  data  but  protects  it  from  current  update.  Usage  Modes 
RETRIEVAL  or  UPDATE  may  also  be  selected. 

CLOSE:  Execution  of  the  CLOSE  command  prevents  the  issuing  run- 
unit  from  further  access  to  the  closed  area  or  areas. 

KEEP:  The  KEEP  command  is  used  to  apply  a page  Lock  to  data  that 
the  run-unit  needs  temporary  exclusive  control  of,  even  though  the  data 
may  remain  unaltered  by  the  run  unit.  Keep  is  not  necessary  when  data 
areas  have  been  opened  with  the  EXCLUSIVE  Option. 
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FIND/FETCH:  The  FIND  command  is  used  to  locate  requested  data. 

(The  programmer  must  follow  the  FIND  with  a GET  before  the  data  is 
accessible  to  the  program).  Normally,  when  a program  needs  to  both 
locate  and  access  a data  record,  a FETCH  command  is  used.  The 
FETCH  contains  an  implied  GET  function.  The  primary  function  of  the 
FIND/FETCH  command  is  to  establish  currency  for  a specific  record 
occurrence  as  described  in  the  Record  Selection  Expression  of  the 
command.  Optionally  this  command  provides  the  means  to  suppress 
selected  currency  indicators. 

GET:  The  GET  command  transfers  the  current  record  of  the  run-unit 

into  the  record  delivery  area  defined  in  the  Schema  Section. 

STORE:  Records  are  stored  in  the  data  base  by  issuing  the  STORE 

command.  A successful  STORE  inserts  a record  from  the  record 
delivery  area  into  a physical  position  in  the  data  base,  links  it  into  all 
sets  for  which  it  is  defined  as  an  Automatic  member,  and  creates  a new 
set  occurrence  for  each  set  in  which  it  is  defined  as  an  Owner  record. 
Currency  indicators  are  updated  (or  optionally  suppressed)  and  the  data 
base  key  is  made  available  to  the  program.  The  record  is  stored 
according  to  its  Location  Mode. 

DELETE:  The  DELETE  command  removes  a record  from  the  data  base 
and  thus  frees  up  both  record  space  and  the  data  base  key  for  system 
use.  The  object  record  of  the  DELETE  command  must  have  been 
previously  established  as  the  current  record  of  the  run-unit.  An 
unqualitied  DELE  I E deletes  the  object  record  and  if  it  is  an  owner 
record,  deletes  all  subsequent  Automatic  members  and  removes  all 
Manual  members  from  the  set  occurrence  owned  by  the  object  record. 
DELETE  with  the  ONLY  option  causes  deletion  only  if  manual  or 
automatic  members  are  not  present.  DELE  I E ALL  deletes  the  record 
and  if  it  is  an  owner  record,  all  subsequent  members,  Manual  and 
Automatic.  If  the  deleted  members  are  owners  as  well,  they  will  be 
handled  as  though  they  were  the  object  of  the  DELETE  command  and  so 
on  down  through  the  hierarchy. 
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MODIFY:  The  MODIFY  command  has  two  functions.  These  are: 


. To  alter  data  in  a data  base  record. 

. To  alter  a data  base  record  set  occurrence  membership. 

The  object  record  of  the  command  is  the  current  record  of  the  run-unit 
and  the  modified  information  is  placed  by  the  program  in  the  record 
delivery  area.  If  the  record  size  increases  as  a result  of  the  MODIFY, 
and  the  record  is  defined  as  variable  length,  new  space  must  be  found  in 
the  same  page  or,  if  specified,  in  the  overflow  areas  associated  with 
that  page.  Execution  of  the  MODIFY  does  not  alter  the  data  base  key. 

INSERT /REMOVE:  INSERT  and  REMOVE  are  the  two  commands  that 

govern  the  occurrence  of  records  described  as  Manual  members  of  the 
set.  Records  can  be  INSERTed  or  REMOVEd  from  either  a specific 
set(s),  or  all  sets  in  which  they  are  described  as  Manual  members. 

]F:  The  IF  command  is  the  COBOL  IF  conditioned  on  specifies  found  in 
the  data  base. 

LOG:  LOG  allows  the  programmer  th  place  additional  data  on  the  DMS 
100  Audit  Trail  Tape  File. 

MOVE:  The  MOVE  command  has  two  functions:  to  supply  the  contents 
of  certain  currency  indicators  to  the  run-unit  in  the  form  of  data  base 
key,  area  key,  or  area  name  and  to  convert  a data  base  key  to  area 
name  and  area  key. 

RECEIVE:  This  RECEIVES  input  messages  or  segments  (partial  mes- 

sages) from  terminals. 

SEND:  SEND  directs  output  messages  to  terminals. 
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2. 


BASIC  FUNCTIONAL  CAPABILITIES 


a.  User  Accessibility 


Because  DMS  I 100  was  developed  specifically  to  support  the  CODASYL 
specification,  it  supports  the  use  of  COBOL.  However,  no  provision  is 
currently  made  for  the  use  of  a machine-oriented  programming  language  or  a 
scientific  language. 

End  user  languages  are  provided  by  RPS  I 100  (Remote  Processing  System 
I 100),  a system  which  provides  a data  processing  capability  for  end  users. 

RPS  interacts  with  a terminal  user  in  tutorial  fashion  and  guides  him  in 
specifying  the  form  of  his  files  and  the  operations  to  be  carried  out  on 
them  by  his  application. 

RPS  provides  data  input,  retrieval,  manipulation,  update  and  output 
capabilities,  and  builds  appropriate  application  programs  for  the  end 
user. 

RPS  appears  to  provide  similar  capabilities  to  that  offered  by  IBM's 
DMS/VS  (Display  Management  System). 

Additionally,  DMS  1100  has  been  enhanced  by  the  addition  of  a Query 
Language  Process  (QLP  I 100).  This  enables  a terminal  user  to  enter  com- 
mands which  allow  him  to  select,  list  and  update  records  and  items  from  the 
data  base. 

No  scientific  end  user  language  is  provided  by  DMS  I 100. 

Data  Communication  support  is  provided  by  the  Communications  Management 
System,  and  also  the  Transaction  Interface  Package.  These  control  trans- 
action processing  and  enable  the  user  to  access  the  data  base  on-line. 
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b. 


Multiple  Use  of  Data 


• DMS  1100  supports  sequential  processing  of  the  data  base,  together  with 
random  access  (by  means  of  the  CALC  clause),  and  indexed  access  using  an 
Indexed  Sequential  location  mode. 

• Multiple  indices  are  not  supported  by  the  Indexed  Sequential  location  mode, 
and  the  SORTED  INDEX  set  order  of  CODAS YL  (which  allows  the  construction 
of  secondary  indexes)  is  not  supported. 

• The  provision  of  secondary  or  multiple  indexes  are  the  users'  responsibility, 
using  sets  and  records  that  are  constructed  by  the  user  in  his  data  base  over 
and  above  the  basic  data. 

• This  facility  is  not  provided  automatically  by  DMS  I 100,  and  may  inhibit  the 
performance  when  access  is  required  to  the  data  base  in  a sequence  other  than 
the  data  base  key  sequence.  It  is  also  significant  when  storing  a record  in  a 
SORTED  Set. 

« The  absence  of  SORTED  INDEXED  sets  requires  DMS  I 100  to  sequentially 
locate  the  appropriate  set  insertion  point,  instead  of  going  directly  to  that  set 
insertion  point  through  the  SORTED  INDEXED  clause.  This  can  have  severe 
performance  implications  if  its  operation  is  not  appreciated. 

c.  Data  Consolidation 


• DMS  I 100  supports  variable  entities  (records)  and  provides  no  limit  to  the 
number  of  repeating  group  types  per  entity  (record),  or  the  number  of 
occurrences  per  repeating  group  type.  A repeating  group  type  in  this  instance 
is  a member  record  participating  in  a set  relationship. 

• Variable  length  records  are  supported  and  no  limit  is  set  on  the  number  of 
nested  levels  for  the  data  base. 
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No  limit  is  set  on  the  number  of  entity  (record)  relationships  per  data  base,  per 
entity  (record)  or  program. 

DATA  INDEPENDENCE 

a.  Levels  of  Mapping 

DMS  supports  only  one  level  of  mapping,  an  internal  level  through  the 
CODAS YL  Schema.  The  schema  facility  of  the  April  1971  CODASYL 
specification  is  not  supported  by  the  DMS  I 100. 

A subsetting  facility  is  provided  only  as  part  of  the  data  base  schema  and  can 
be  copied  into  an  application  program.  This  does  not  provide  the  full  facility 
of  the  subschema.  However,  improved  subschema  support  is  provided  in  the 
June  1978  release  7 of  DMS  I 100. 

The  lack  of  a Conceptual  or  External  level  of  mapping  inhibits  the  data 
independence  capability  of  DMS  I 100.  The  application  program  is  based  upon 
a particular  schema  and  hence  the  physical  structure  of  the  data  base.  This 
constrains  the  Data  Base  Administrator  in  the  extent  to  which  he  can 
restructure  that  data  base  in  the  future,  without  involving  significant  program 
modifications. 

This  limited  data  independence,  through  a provision  of  only  one  mapping  level, 
the  schema,  could  be  offset  if  DMS  I 100  supported  the  field  level  definition 
specified  by  CODASYL.  However,  this  support  is  not  provided  - neither  is 
format  translation  for  fields. 

While  DMS  I 100  does  support  most  of  the  CODASYL  specifications,  its  current 
lack  of  effective  data  independence  support  severely  inhibits  the  ability  of  a 
data  base  to  change  based  on  future  application  requirements  without  involv- 
ing considerable  program  maintenance. 


- 165  - 


© 1978  by  INPUT,  Menlo  Park,  CA  94025.  Reproduction  Prohibited. 


INPUT 


b. 


Data  Base  Changes 


a 


The  extent  of  modification  necessary  for  data  base  structuring  can  be  gauged 
by  the  activities  required  to  effect  the  following  modifications. 

CHANGE  DEVICE  TYPE:  The  device  type  can  be  changed  by  reloading 
the  data  base  and  requires  no  change  in  program  logic. 

CHANGE  ACCESS  METHOD:  A change  of  operating  system  access 

method  requires  the  data  base  to  be  reloaded,  the  programs  to  be 
recompiled  and  the  program  logic  changed  to  use  the  appropriate 
commands  for  the  new  access  method. 


CHANGE  ENTITY  VIEW:  A restructuring  of  the  data  base  record  will 
require  the  data  base  to  be  reloaded,  programs  to  be  recompiled  and 
logic  to  be  changed.  This  is  necessary  because  of  the  limited  data 
independence  capability  of  DMS  1100  through  lack  of  support  for  a 
subschema  or  field  level  definition. 


ADD  NEW  ENTITY:  A new  entity  can  be  added  without  requiring 

program  logic  changes  or  recompilation. 

ADD  NEW  REPEATING  GROUP  TYPE:  The  addition  of  a new 

repeating  group  type  requires  modification  of  logic  for  add  and  delete 
programs,  recompilation  and  reloading  of  the  data  base. 

ADD  NEW  RELATIONSHIP;  Similarly,  the  addition  of  a new  set 
relationship  requires  program  logic  changes  for  add  and  delete  programs 
together  with  recompilation  and  reloading  of  the  data  base. 


ADD  NEW  FIELD  TO  REPEATING  GROUP:  The  addition  of  a new  field 
to  a repeating  group  requires  the  program  to  be  recompiled  and  the 
data  base  to  be  reloaded,  but  generally  will  not  require  a change  in 
program  logic  provided  the  new  field  is  added  to  the  repeating  group. 
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CHANGE  FIELD  FORMAT:  Similarly,  a change  of  field  format  will 

require  program  recompilation  and  data  base  reloading  but  will  not 
require  program  logic  changes. 

4.  DATA  INTEGRITY 

a.  Exclusive  Control 

• DMS  MOO  provides  Exclusive  Control  at  the  "page"  level.  This  typically 
represents  a physical  record  or  "block"  and  may  be  part  or  all  of  a disk  track. 

• The  application  programmer  is  responsible  for  establishing  Exclusive  Control 
through  use  of  the  KEEP/FREE  commands.  These  function  effectively  as 
LOCK  and  UNLOCK  commands  and  ensure  that  no  concurrently-executing 
run-unit  (program)  is  unable  to  update  the  same  record  KEPT  by  another  run- 
unit  until  that  other  run-unit  has  FREED  the  record. 

• While  a deadlock  is  possible,  the  system  is  able  to  resolve  deadlock  situations. 
This  deadlock  resolution  is  handled  by  manipulating  the  recovery  system. 

• DMS  I 100  enables  pages  to  be  recovered  by  recording  images  of  those  pages 
before  and  after  any  alteration.  These  page  images  are  known  as  "looks"  and 
DMS  I 100  records  before  - looks  and  after  - looks  on  an  audit  trail  tape,  and 
also  quick  - before  - looks,  which  are  the  same  as  before  - looks  but  are  held 
on  disk  storage. 

• It  is  not  evident  whether  DMS  I 100  resolves  deadlocks  by  dynamically  backing 
out  a deadlocked  program  by  means  of  quick-before-looks,  or  whether  this  is 
done  in  an  off-line  mode  using  the  recovery  facility. 


- 167  - 


© 1978  by  INPUT,  Menlo  Park,  CA  94025.  Reproduction  Prohibited. 


INPUT 


5. 


RECOVERY/RESTART 


9 


a.  Recovery 

• DMS  I 100  records  after-looks  on  an  audit  trail  tape.  A Copy/Restore  utility  is 
provided  to  dump  the  data  base  for  use  as  a back-up  when  the  current  version 
is  destroyed  or  corrupted  in  an  unpredictable  manner. 

• A Recovery  utility  is  then  used  to  reconstruct  the  data  base  using  the  after- 
looks recorded  on  the  audit  trail  tape. 

• No  facility  is  provided  for  summarizing  after-looks  so  that  only  the  most 
recent  version  of  a data  base  record  is  used  during  the  recovery  process. 

• The  smallest  recoverable  unit  is  the  data  set  of  the  data  base. 

b.  Batch  Restart 

9 

• The  system  records  before-looks  as  well  as  quick  - before  - looks.  While  this  is 
a system  responsibility,  the  programmer  can  also  participate  in  the  recording 
of  specific  information  on  the  audit  trail  tape. 

• A Backout  (Rollback)  utility  is  provided  for  run-unit  rollback.  This  reverses 
the  effects  of  a run-unit  which  has  failed,  while  quick  recovery  reverses  the 
effect  of  all  run-units  which  were  active  at  the  time  of  a system  failure. 

• Intermediate  restart  points  can  be  specified  for  batch  restart. 

c.  On-Line  Restart 

• DMS  MOO  enables  on-line  messages  to  be  logged.  However,  it  is  not  known 
whether  this  requires  some  participation  by  the  application  programmer  and 
also  whether  messages  are  logged  on  the  same  audit  trail  tape  as  used  for  the 
data  base. 

9 
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• If  the  same  audit  trail  tape  is  not  used,  a potential  integrity  exposure  can 
result  due  to  potential  lack  of  synchronization  between  the  two  logs. 

• A command  rollback  facility  is  used  to  backout  the  effect  of  a Data 
Management  Language  command  error,  to  permit  a retry  of  that  command. 

• System  restart  is  supported  using  the  quick-before  looks  recorded  on  disk 
storage.  Quick  recovery  is  then  used  to  back-out  all  currently  active  run-units 
at  the  time  of  the  system  failure. 

6.  DATA  SECURITY 

• Data  security  is  controlled  primarily  by  the  programmer  through  the  OPEN 
clause  which  enables  him  to  specify  the  appropriate  security  level  by  repeating 
group  type  and  the  access  option  in  terms  READ,  UPDATE  or  EXCLUSIVE. 

• The  absence  of  a subschema  support  capability  in  DMS  I 100  potentially 
enables  an  application  program  to  access  any  part  of  the  data  base  specifying 
the  appropriate  access  option  and  repeating  group  type. 

• This  introduces  a possible  security  exposure  because  the  Data  Base  Admini- 
strator is  unable  to  restrict  this  level  of  access  by  the  application  program. 

7.  EASE  OF  USE 

a.  Data  Base  Administrator 

• Design  and  Measurement  aids  are  provided  to  extract  statistics  from  the  Data 
Management  Routine  of  DMS  I 100  and  indicate  actual  system  performance, 
together  with  data  base  accesses  necessary  to  follow  the  various  set  relation- 
ships. 

• Documentation  and  Control  aids,  such  as  a Data  Dictionary,  are  not  provided 
by  DMS  I 100. 


- 169- 

© 1978  by  INPUT,  Menlo  Park,  CA  94025.  Reproduction  Prohibited. 


INPUT 


• The  lack  of  data  base  restructuring  aids,  such  as  a reorganization  utility, 
inhibits  the  ability  of  the  data  base  administrator  to  readily  handle  redesign  or 
restructuring  of  the  data  base  to  accommodate  changed  application  require- 
ments. 

• Utility  routines  are  provided  to: 

Compact  page  space  and  extract  space  utilization  statistics. 

Provide  a formatted  print-out  of  page  contents. 

Expand  the  size  of  pages  in  an  area. 

Initialize  the  pages  when  an  area  is  extended. 

Verify  that  set  pointers  are  correct. 

Patch  or  alter  a word  on  a page. 

Dump  and  recover  an  area  of  the  data  base. 

Place  areas  of  the  data  base  off-  or  on-line. 

► Education  is  provided  on  both  the  Externals  and  Internals  of  DMS  I 100.  In 
addition,  reference  documentation  is  provided. 

b.  Application  Programmer 

1 The  Data  Manipulation  Language  supported  by  DMS  I 100  is  a procedural 
COBOL-like  language. 

The  CODASYL  specifications  provide  a number  of  COBOL  verbs  for  manipu- 
lation of  the  data  base.  DMS  MOO  uses  a pre-processor  to  expand  the  data 
base  verbs  into  appropriate  COBOL  statements  for  carrying  out  particular 
functions. 
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• A total  of  sixteen  operations  are  supported  by  the  Data  Manipulation 
Language. 

• Only  one  record  type  can  be  specified  per  command. 

• The  data  search  facilities  supported  by  DMS  I 100  enable  only  an  equal 
comparison  to  be  applied  on  records  to  be  located.  While  this  search 
capability  can  be  applied  to  mutiple  record  types,  the  apparent  lack  of  a high 
or  low  search  capability  is  somewhat  limiting.  No  support  is  provided  for 
Boolean  operators. 

c.  End  User 

• User  language  support  is  provided  by  the  Query  Language  Processor  (QLP)  and 
the  Remote  Processing  System  (RPS).  These  are  both  intended  to  be  used  on- 
line and  enable  retrieve,  update,  add  and  delete  functions  to  be  carried  out 
against  the  data  base. 

• Application  support  is  provided  by  UNIS  (Univac  Industrial  Systems)  which 
provide  suport  for  bill-of-materials,  inventory  and  other  manufacturing  appli- 
cations. 

8.  COST /PERFORMANCE 

a.  Measurable  Costs 

• DMS  I 100  is  provided  for  no  charge  as  part  of  the  standard  software  available 
for  the  1 1 00  Series. 

b.  Real  Memory 

• The  real  memory  required  is  approximately  50K  words  for  the  first  user.  The 
incremental  memory  for  each  additional  user  is  not  known. 
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c. 


Performance  Constraints 


• DMS  MOO  supports  a multithread  data  base  architecture  with  Exclusive 
Control  lock-out  at  the  "page"  (block)  level.  While  this  provides  lock-out  at 
the  physical  record  level,  participation  by  the  programmer  in  KEEP  and  FREE 
commands  can  imply  a possible  performance  constraint  if  the  application 
programmer  does  not  issue  the  FREE  at  the  earliest  possible  time. 

• Input/Output  buffer  management  is  provided  using  a common  data  base  pool. 

• Data  is  grouped  with  no  limit  in  the  number  of  repeating  group  types  per  data 
set,  or  the  number  of  data  sets  used  per  data  entity  (record). 

• Support  is  provided  for  DIRECT  set  relationships,  using  direct  pointers  for 
efficient  access  to  related  records. 
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IV  E 


IDMS  (C  u I I i n a n e And  ICL) 


J 


J 


J 


E. 


IDMS  (CULLINANE  AND  ICL) 


I.  GENERAL  DESCRIPTION 

• IDMS  (Integrated  Data  Base  Management  Systems)  is  a subset  of  the 
CODASYL  data  base  specifications  as  proposed  by  the  CODASYL  Program- 
ming Languages  Committee  and  the  Data  Description  Languages  Committee  in 
1975. 

• IDMS  was  written  by  the  B.F.  Goodrich  Company  of  Akron,  Ohio,  in  1971  and 
was  based  upon  the  CODASYL  Data  Base  Task  Group  report  of  April  1971. 
The  marketing  and  development  rights  for  IDMS  were  sold  to  the  Cullinane 
Corporation  of  Boston,  Massachusetts,  in  1973. 

• Cullinane  has  since  sold  the  exclusive  marketing  and  development  rights  for 
IDMS  on  ICL  equipment  to  ICL.  The  marketing  and  development  rights  for  the 
PDP  I 1/70  version  of  IDMS  has  been  sold  also  to  Digital  Equipment  Corpor- 
ation, who  market  and  support  it  as  DBMS  I I. 

• IDMS  is  the  only  data  base  management  system  based  on  the  CODASYL  data 
base  specifications,  which  runs  on  IBM  equipment.  IDMS  is  supported  on  IBM 
System/360  and  System/370  under  DOS,  DOS/VS,  OS,  OS/VS  I and  OS/VS  2. 
This  overview  of  IDMS  contains  particular  emphasis  on  the  version  imple- 
mented for  IBM  equipment. 

a.  Data  Description  Language  (DDL) 

• IDMS  is  designed  to  provide  data  base  facilities  for  ANS  COBOL  programs  and 
any  other  host  language  that  supports  a CALL  statement  or  equivalent. 
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• IDMS  supports  networks  and  hierarchical  data  structures  and  provides  separate 
language  facilities  for  the  description  of  data  (the  Data  Description  Language) 
and  the  manipulation  of  data  (Data  Manipulation  Language). 

• IDMS  supports  both  the  definition  of  data  base  schema  and  subschema  through 
the  DDL.  While  there  is  only  one  complete  schema  for  a data  base,  there  may 
be  any  number  of  subschemas,  each  describing  a specific  combination  of 
records,  sets  and  areas  which  apply  to  a given  application  or  program. 

• Additionally,  a Device  Media  Control  Language  (DMCL)  is  used  to  specify  the 
physical  subset  of  the  data  bases  and  the  buffering  requirements  for  a 
subschema  (and  therefore  programs  using  the  subschema).  Each  subschema 
references  a DMCL  description,  and  where  a number  of  programs  need 
concurrent  access  to  a data  base  (for  example  in  an  on-line  environment),  their 
subschemas  will  all  reference  the  same  DMCL  description. 

b.  IDMS  Directory 

• IDMS  is  based  on  the  use  of  an  integrated  directory  in  which  reside  all 
definitions  of  schemas,  subschemas  and  DMCL  Descriptions.  Each  of  these 
definitions  names  and  defines  other  entities  such  as  schema  record  types, 
subschema  record  types,  buffers  and  so  on. 

• The  Directory  is  used  by  the  IDMS  processors  to  store  their  results  and  to 
perform  checks  on  the  validity  of  the  definitions. 

c.  IDMS  Structure 

• The  IDMS  structure  divides  the  data  base  into  areas,  pages  and  lines. 

The  page  is  the  unit  of  physical  data  transfer  and  can  be  equated  with  a 
block  that  is  the  unit  of  data  transfer  with  conventional  data  manage- 
ment access  methods. 
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An  area  is  a group  of  consecutively  numbered  pages  and  is  the  unit  of 
physical  usage.  In  other  words,  only  one  or  more  complete  areas  can  be 
loaded  on-line.  It  is  also  the  unit  used  by  IDMS  for  security  purposes 
for  dumping  and  restoring. 

A line  is  the  smallest  logical  unit  of  addressable  storage.  A page 
contains  a number  of  lines  up  to  a maximum  of  255  per  page.  Each  line 
can  be  considered  equivalent  to  a logical  record. 

• Within  an  IDMS  data  base,  the  ranges  of  page  numbers  are  allocated  to  areas 
with  gaps  left  for  expansion  between  the  ranges.  Associated  with  each  record 
is  a data  base  key  that  uniquely  identities  that  record. 

• For  each  record  type  the  data  base  administrator  specifies  the  area  in  which 
the  record  is  stored  and  the  method  to  be  used  to  define  the  data  base  keys. 
The  Data  Base  Administrator  has  three  choices  of  how  record  is  assigned  to 
data  base  key  and  therefore  how  its  physical  location  within  the  data  base  is 
determined. 

• A record  type  stored  in  CALC  Mode  can  be  used  as  an  entry  point  into  the 
data  base.  IDMS  applies  a randomizing  algorithm  to  the  fields  specified  by  the 
data  base  administrator  to  be  used  to  determine  the  record  location  in  the 
data  base.  The  IDMS  randomizing  algorithm  was  designed  to  provide  an  even 
spread  of  records  across  a particular  area. 

• The  second  placement  mode  is  VIA  SET.  This  method  is  used  to  group  member 
records  as  near  as  possible  to  their  owner  in  a given  set.  The  Data  Base 
Administrator  specities  that  a particular  record  type  is  to  be  stored  VIA  the 
set.  IDMS  then  stores  member  records  as  close  as  possible  to  their  owner  for  a 
particular  set  occurrence.  Records  stored  VIA  SE  I are  not  data  base  entry 
points,  and  can  only  be  accessed  indirectly,  generally  after  accessing  their 
owners. 
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• The  third  placement  mode  is  DIRECT.  This  enables  a program  to  specify  the 
page  address  of  a record  to  IDMS  directly.  Thus,  the  data  base  administrator 
can  define  a specific  placement  strategy  when  required. 

• IDMS  does  not  currently  support  the  Indexed  Location  Mode.  The  Indexed 
Location  Mode,  which  is  part  of  the  CODASYL  specifications,  permits 
definition  of  indexed  sequential  type  placement  strategies,  where  a particular 
field  is  defined  as  the  key  for  indexing  purposes  and  the  records  are  held  in  key 
sequence. 

d.  Set  Relationships 

• Perhaps  the  most  important  feature  of  a data  base  management  system  is  the 
ability  to  support  logical  data  relationships.  The  set  is  the  mechanism  by 
which  a logical  relationship  is  established  between  two  or  more  record  types 
and  is  the  building  block  that  allows  various  data  structures  to  be  built. 

• Each  occurrence  of  a set  includes  one  occurrence  of  an  owner  record  type  and 
(possible)  multiple  occurrences  of  one  or  more  member  types. 

• Additional  power  and  flexibility  is  available  in  that  any  record  occurrence  can 
participate  in  any  set  relationships  as  either  an  owner  or  member.  This 
feature  allows  the  construction  of  complex  integrated  data  structures. 

• The  basic  characteristics  of  a set  are  as  follows: 

A set  is  a named  collection  of  record  types. 

Any  number  of  sets  may  be  declared  in  a schema. 

Each  set  must  have  one  owner  record  type  and  one  or  more  member 

record  types. 


- 176  - 


© 1978  by  INPUT,  Menlo  Park,  CA  94025.  Reproduction  Prohibited. 


INPUT 


Set  descriptions  are  independent  of  the  LOCATION  MODE  of  the  owner 
or  member  record  types. 

Any  record  type  may  be  declared  as  the  owner  record  type  of  one  or 
more  sets. 

Any  record  type  may  be  declared  as  a member  record  type  of  one  or 
more  sets. 

Each  set  is  described  independently  of  all  other  sets. 

A record  occurrence  can  not  appear  in  more  than  one  occurrence  of  the 
same  set. 

Each  occurrence  of  a set  includes  one  and  only  one  occurrence  of  its 
member  record. 

A set  occurrence  may  contain  any  number  of  member  record  occur- 
rences. 

A record  may  exist  as  the  owner  of  a set  and  a member  of  another  set. 
Two  or  more  records  can  exist  as  members  of  the  same  set. 

• For  each  occurrence  of  a set,  a chain  of  pointers  is  created  that  can  be 
followed  by  IDMS  and  that  provides  for  serial  access  to  all  records  in  the  set 
occurrence. 

• The  owner  record  of  a set  occurrence  contains  a pointer  to  the  first  member 
record  in  the  set,  which  in  turn  contains  a pointer  to  the  second  member  and 
so  on  till  the  last  member  in  the  set,  which  points  back  to  the  owner.  These 
forward  pointers  are  referred  to  as  NEST  POINTERS. 
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• A backward  pointer  can  be  specified  by  including  the  clause  LINKED  TO 
PRIOR  with  the  definition  of  the  set  relationship.  When  this  clause  is  used, 
backward  pointers  (that  is  pointers  in  the  PRIOR  DIRECTION)  are  also 
provided. 

• In  addition,  the  occurrences  of  any  of  the  member  record  types  specified  for  a 
set  may  be  declared  to  be  LINKED  TO  OWNER.  This  causes  each  of  the 
member  record  occurrences  involved  to  point  to  their  owner  record  occur- 
rence, so  enabling  the  owner  record  to  be  accessed  quickly  without  requiring 
that  1DMS  follow  all  member  records  in  the  chain  either  through  the  NES  I or 
PRIOR  direction  to  reach  the  owner. 

e.  Set  Membership  Options 

• Set  membership  types  indicate  the  manner  in  which  a record  is  established  (or 
removed)  as  a member  of  a set  and  the  restrictions  on  the  use  of  the 
INSERT/REMOVE  DML  Statement  (CONNECT/DISCONNECT).  A set  mem- 
bership may  be  described  as  either  MANDATORY  or  OP  I IONAL  and  also 
AUTOMATIC  or  MANUAL. 

• The  MANDATORY  or  OPTIONAL  part  of  the  set  membership  option  describes 
the  control  over  whether  a record  must,  or  need  not,  participate  as  a member 
of  a set. 


MANDATORY  means  that,  once  the  membership  of  a record  occurrence 
in  a set  is  established,  the  membership  is  permanent.  Its  logical 
position  may  be  changed  indirectly  by  a MODIFY  DML  statement,  but 
the  record  occurrence  cannot  be  REMOVED  from  the  set. 


OPTIONAL  means  that  the  membership  of  a record  occurrence  in  a set 
may  be  terminated  by  a REMOVE  DML  statement.  In  this  instance  only 
the  set  membership  is  removed,  while  the  record  occurrence  itself 
remains  in  the  data  base. 
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The  record  occurrence  is  still  accessible  in  other  ways  except  through  the  set 
from  which  it  was  just  removed. 

The  AUTOMATIC  or  MANUAL  part  of  the  set  membership  type  describes 
control  over  the  establishment  of  set  membership  for  a record  occurrence. 

AUTOMATIC  means  that  membership  in  a set  is  established  auto- 
matically by  the  system  whenever  an  occurrence  of  a member  record  is 
stored  in  a data  base  using  the  STORE  DML  statement. 

MANUAL  means  that  set  membership  is  not  established  when  a member 
record  occurrence  is  stored  in  the  data  base  but  may  later  be 
established  by  an  application  program  issuing  an  INSERT  DML  state- 
ment. 

f.  Set  Order 

Set  order  specities  the  logical  order  of  member  record  occurrences  within 
each  occurrence  of  a set  and  is  independent  of  physical  placement  of  the 
records  themselves. 

Set  order  h IKS  T specities  that  a new  member  record  occurrence  is  to 
be  positioned  such  that  it  will  be  the  first  record  encountered  in  the 
NEXT  direction  of  the  set. 

Set  order  LAS  I means  that  new  record  occurrence  is  positioned  in  the 
set  such  that  it  will  be  the  last  record  encountered  in  the  NEXT 
direction  of  the  set. 

Set  order  NEXT  means  that  a new  record  occurrence  is  positioned 
immediately  following  (in  the  NEXT  DIREC I ION)  the  record  occur- 
rence last  selected  by  the  user. 
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Set  order  PRIOR  means  that  a new  record  occurrence  is  positioned 
immediately  before  (in  the  PRIOR  direction)  the  record  occurrence  last 
selected  by  the  user. 

Set  order  SORTED  specifies  that  a new  record  occurence  is  to  be 
logically  positioned  in  the  set  in  ascending  or  descending  sequence 
based  on  the  value  of  a specified  field  (group  or  elementary  level). 

• The  set  order  can  be  specified  as  either  ASCENDING  or  DESCENDING.  The 
member  records  in  the  set  relationship  will  be  logically  sequenced  according  to 
the  value  of  the  field  (data-item)  used  as  the  ordering  key. 

• Set  order  is  SORTED  requires  indication  of  how  IDMS  is  to  handle  records  with 
duplicate  values  in  the  ordering  key.  These  are  the  DUPLICATES  LAST  and 
FIRST. 


The  DUPLICATES  FIRST  OPTION  indicates  that  a record  with  a 
duplicate  key  will  be  accepted  and  positioned  immediately  before  the 
first  existing  record  with  the  same  duplicate  key  in  the  logical  sequence 
of  the  set. 

The  DUPLICATE  LAST  option  positions  the  record  with  the  duplicate 
key  immediately  after  the  last  existing  record  with  that  duplicate 
value. 

The  DUPLICATED  NOT  ALLOWED  option  indicates  that  a record  with 
a duplicate  key  field  will  not  be  accepted  or  positioned  in  the  set 
occurrence. 

g.  Maintenance  Of  Set  Relationships 

• The  establishment  and  maintenance  of  all  relationships  between  records, 
specified  by  means  of  declaring  sets  in  the  schema,  is  handled  automatically 
by  IDMS.  Maintenance  is  required  when: 


- 180  - 


© 1978  by  INPUT,  Menlo  Park,  CA  94025.  Reproduction  Prohibited. 


INPUT 


A record  that  has  been  declared  as  an  owner  or  member  in  one  or  more 
sets  is  added  to  or  deleted  from  the  data  base. 


A record  is  explicitly  inserted  or  removed  from  a set. 


A record  is  modified  in  a way  that  changes  its  logical  position  in  the  set 
- for  example,  modification  of  the  control  key  value  of  a sorted  set. 


h.  Data  Manipulation  Language  (DML) 

• Not  all  of  the  DML  Syntax  described  in  the  CODAS YL  Data  Base  Specifica- 
tions have  been  implemented  by  IDMS.  However,  the  latest  versions  of  IDMS 
on  IBM  systems,  ICL  systems,  and  DEC  systems  (DBMS  II)  implement  the 
updated  DML  Syntax  as  published  by  the  CODAS  YL  Data  Base  Language  Task 
Group  (DBLTG),  which  was  published  in  March  1973. 

i.  IDMS  Data  Directory 

• IDMS  uses  an  Integral  Data  Directory,  which  acts  as  a repository  of  all  schema 
definitions,  DMCL  definitions  and  subschema  definitions. 

• A group  of  eight  comprehensive  Data  Directory  reports  can  be  produced  by 
IDMS.  These  reports  provide  system  documentation  for  use  by  programmers, 
analysts,  Data  3ase  Administrators,  Data  Processing  Management  and  User 
Department  Personnel. 

• In  addition  to  the  Data  Directoy  provided  by  IDMS,  several  other  indepen- 
dently developed  Data  Dictionaries  are  available  for  use  with  IDMS.  One  of 
these  is  DATAMANAGER,  which  was  developed  by  Management  Systems 
Programming  Ltd.  of  the  U.K.  and  the  U.S.A. 
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2. 


BASIC  FUNCTIONAL  CAPABILITIES 


3 


a.  User  Accessibility 

The  IBM  version  of  IDMS  supports  ASSEMBLER,  PL/ 1 and  FORTRAN  through 
a CALL  macro  interface.  The  full  COBOL  Data  Manipulation  Language  is 
supported  by  both  the  IBM  version  of  IDMS  and  also  the  ICL  version  using  2900 
COBOL. 


The  IBM  version  supports  CULPRIT  as  a commercial  end  user  language.  It  is 
not  known  whether  the  ICL  version  supports  CULPRIT. 


The  data  communications  support  initially  provided  by  the  IBM  version  of 
IDMS  was  the  Transaction  Processing  System,  using  the  General  Communica- 
tions Interface  offered  by  CULLINANE.  IDMS  also  provides  an  interface  to 
IBM  CICS/VS  and  to  ALTERGO's  SHADOW  II. 

The  ICL  IDMS  provides  Data  Communication  support  through  the  2900  VME/B 
operating  systems. 


b.  Multiple  Views  of  Data 

Both  the  ICL  and  the  IBM  versions  of  IDMS  support  sequential,  random  and  a 
limited  indexed  capability.  However,  IDMS  version  (4.0)  does  not  support 
SORTED  INDEXED.  Similarly,  IDMS  does  not  support  multiple  indices  in 
version  4.0. 

c.  Data  Consolidation 


• Variable  length  entities  are  supported  through  standard  CODASYL  set  rela- 
tionships. These  provide  for  an  unlimited  number  of  repeating  group  types  per 
entity  (member  records  per  data  base  record). 


• Similarly,  there  is  no  limit  to  the  number  of  occurrences  per  repeating  group 
type  (member  record)  within  a set  relationship. 
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• Earlier  versions  of  IDMS  on  IBM  systems  did  not  support  variable  length 
member  record  occurrences.  However,  version  4.0  provides  support  for 
variable  length  occurrences  through  the  OCCURS  DEPENDING  ON  clause. 

• There  is  no  limit  in  the  number  of  nested  levels  of  set  relationships  that  may 
be  supported  by  IDMS. 

• In  considering  any  limits  on  the  number  of  entity  (data  base  record)  relation- 
ships, IDMS  supports  an  unlimited  number  of  relationships  per  data  base,  per 
data  base  record,  and  per  program. 

3.  DATA  INDEPENDENCE 

a.  Levels  of  Mapping 

• Effectively,  IDMS  provides  only  two  levels  of  mapping  to  ensure  data 
independence.  These  are  Internal  map  (the  schema)  and  an  External  map  (the 
subschema). 

• However,  the  ability  of  CODASYL  implementations  to  permit  entry  into  the 
data  base  through  any  record  whose  location  mode  is  CALC,  DIRECT  or 
INDEXED  does  permit  the  logical  restructuring  of  a data  base  and  so  provides 
a limited  Conceptual  map. 

• Neither  the  IBM  or  ICL  version  of  IDMS  supports  format  translation  of  fields. 
The  subschema,  however,  can  be  used  to  define  only  those  fields  in  a record 
relevant  to  a program,  to  ensure  that  the  program  is  unaffected  by  other  fields 
in  the  record  not  defined  in  the  subschema. 

b.  Data  Base  Changes 

• Change  Device  Type:  A change  of  device  type  with  IDMS  does  not  reguire  any 
logic  change  in  the  program  or  program  recompilation.  The  data  base  only 
needs  to  be  reloaded  for  those  areas  to  be  transferred  to  the  new  device  type. 
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• Change  Access  Method:  As  1DMS  is  implemented  based  on  a Direct  Access 

Method,  current  versions  of  IDMS  do  not  permit  a change  of  access  methods. 

• Change  Entity  View:  A change  in  the  logical  structure  of  a data  base  record  - 
made  up  of  several  set  relationships  and  member  records  - reguire  a change  in 
logic  and  program  recompilation  only  for  those  Add/Delete  programs  reguired 
for  data  base  maintenance.  Such  a restructuring  would  reguire  reloading  of 
the  relevant  parts  of  the  data  base. 

• Add  New  Entity;  The  addition  of  a new  data  base  record  does  not  reguire  a 
change  in  logic  or  program  recompilation.  However,  it  may  reguire  reloading 
of  that  part  of  the  data  base  to  contain  the  new  data  base  record  extension. 

• Add  New  Repeating  Group  Type:  As  for  a change  of  entity  view,  the  addition 
of  the  new  repeating  type  may  reguire  a program  logic  change  and  recompila- 
tion for  add  and  delete  programs  that  are  used  for  data  base  maintenance  and 
must  be  aware  of  the  entire  data  base  structure.  Similarly,  those  parts  of  the 
data  base  affected  may  need  to  be  reloaded. 

• Add  New  Relationships:  The  addition  of  the  new  set  relationships  to  the  data 
base  will  affect  add  and  delete  programs  used  for  the  data  base  maintenance. 
Such  programs  will  reguire  possible  changes  in  logic  and  program  recompila- 
tion as  well  as  reloading  the  relevant  parts  of  the  data  base. 

• Add  New  Field  to  Repeating  Group:  The  addition  of  a new  field  to  a repeating 
group  (record)  can  be  accommodated  within  the  subschema.  This  will  reguire 
no  program  logic  change  but  generally  will  reguire  program  recompilation  and 
of  course  a reload  of  the  relevant  part  of  the  data  base  containing  the  record 
to  which  the  new  field  is  added. 

• Change  Field  Format:  A change  of  field  format  will  most  likely  reguire  a 

change  in  program  logic  or  ASSEMBLER  programs,  followed  by  program 
recompilation  and  a data  base  reload  of  the  affected  parts  of  the  data  base. 
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DATA  INTEGRITY 


a.  Exclusive  Control 


Exclusive  control  across  partitions  (interpartition)  at  the  lowest  level  is  at  the 
physical  block  (page).  The  lowest  isolated  level  of  exclusive  control  is  an  area 
which  can  be  one  or  more  data  sets. 

Within  a partition  (intrapartition)  such  as  in  an  environment  with  CICS  -IDMS, 
again  the  lowest  level  of  exclusive  control  is  the  block  (page)  and  the  lowest 
isolated  level  is  the  area.  This  applies  to  the  IBM  version  of  IDMS.  However, 
intrapartition  exclusive  control  is  not  relevant  on  the  ICL  version  of  IDMS. 


Deadlock  is  possible  with  IDMS  - both  the  IBM  and  ICL  versions.  The 
programmer  is  responsible  for  establishing  exclusive  control,  maintaining 
isolation  and  resolving  deadlocks. 

RECOVERY/RESTART 

a.  Recovery 

After-images  are  automatically  logged  by  the  IDMS  system. 

Utility  support  is  provided  with  a Copy/Restore  utility  for  backup  and 
restoration  of  the  data  base. 

No  log  summarization  utility  is  provided  to  consolidate  only  the  most  recent 
activity  on  a log  for  the  data  base.  Instead,  data  base  recovery  is  based  on  the 
entire  log  contents. 

A utility  is  provided  to  apply  log  activity  to  the  data  base  for  data  base 
reconstruction  after  an  unrecoverable  I/O  error. 

The  smallest  recoverable  unit  of  an  IDMS  data  base  is  an  area. 
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b. 


Batch  Restart 


• Before-images  are  automatically  logged  by  the  IDMS  system. 

• A utility  is  provided  in  the  IBM  version  of  IDMS  to  close  off  a log  tape  that 
could  not  be  normally  closed  by  IDMS  due  to  a system  failure. 

• A backout  utility  is  provided  in  each  version  of  IDMS  to  backout  the  partial 
processing  of  an  uncompleted  batch  program. 

• While  intermediate  restart  points  can  be  defined  in  ICL  batch  programs, 
earlier  versions  of  the  IBM  version  of  IDMS  did  not  provide  support  for 
intermediate  batch  restart  points.  It  is  not  known  if  this  support  is  provided  in 
later  versions. 

c.  On-Line  Restart 


• While  message  logging  is  provided  automatically  by  the  ICL  version  of  IDMS, 
such  support  in  the  IDMS  version  is  dependent  upon  the  particular  Data 
Communications  package  used  with  IDMS.  In  the  case  of  CICS/VS  used  with 
IDMS,  message  logging  apparently  is  supported  only  with  IBM's  SNA  terminals. 

• Backout  of  an  incompletely  processed  task  is  provided  in  the  ICL  version  and 
also  in  the  IBM  version  of  IDMS  when  CICS/VS  is  used  as  the  Data 
Communications  package.  The  support  provided  permits  a system  restart 
following  an  on-line  failure. 

6.  DATA  SECURITY 

• Data  Security  is  provided  through  a password  mechanism.  At  present,  this 
mechanism  is  rather  restrictive  in  the  degree  of  security  that  it  can  enforce. 
The  only  password  accepted  is  "YES,"  which  provides  or  prevents  access  to  the 
appropriate  part  of  the  data  base. 
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• The  password  security  is  provided  in  the  subschema  only.  The  full  PRIVACY 
LOCKS  capability  of  CODASYL  in  the  schema  and  subschema  is  not  supported 
by  IDMS.  Thus,  it  is  not  possible  with  the  current  version  of  IDMS  to  define  a 
lock-name  into  which  a program  can  move  an  appropriate  password  for 
checking  by  IDMS. 

• The  restriction  level  is  at  the  repeating  group  type  (record). 

• The  access  security  supported  by  IDMS  permits  control  of  read,  update,  add, 
delete  or  exclusive  access. 

• Security  is  enforced  in  the  subschema  by  the  Data  Base  Administrator.  The 
Data  Base  Administrator  indicates  whether  a program  is  permitted  access  to 
certain  elements  of  the  data  base  or  certain  functions  through  specification  of 
a privacy  lock. 

7.  EASE  OF  USE 

a.  Data  Base  Administrator 

• The  Data  Description  Language  type  used  by  IDMS  is  the  CODASYL  pro- 
cedural DDL. 

• Current  information  indicates  that  no  data  base  design  aids  are  provided  with 
IDMS,  and  it  appears  that  only  the  ICL  version  provides  data  base  performance 
measurement  tools. 

• An  integral  part  of  both  versions  of  IDMS  is  the  integrated  Data  Dictionary  for 
documentation  and  control.  In  addition,  DATAMANAGER,  a Data  Dictionary 
provided  by  Management  Systems  Programming,  a U.K.  software  company,  is 
available  with  IDMS. 
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No  utilities  appear  to  be  available  with  the  IBM  version  of  IDMS  for  data  base 
restructuring.  The  ICL  version  does  however  appear  to  provide  some  utility 
support  in  this  area. 

A DBOMP  bridge  is  provided  for  conversion  to  IDMS  in  the  IBM  version. 

Very  good  education  is  provided  at  the  externals  level  of  IDMS  with  some 
internals  education  provided.  The  documentation  provided  by  IDMS  is  excel- 
lent, with  clearly  documented  user  guides  and  reference  manuals  containing 
liberal  use  of  examples. 

b.  Application  Programmer 

The  Data  Manipulation  Language  suported  by  IDMS  is  the  CODASYL  DML  for 
COBOL.  Additionally,  the  IBM  version  supports,  through  a CALL  MACRO 
interface,  support  of  ASSEMBLER,  PL/ 1 and  FORTRAN  programs. 

IDMS  supports  13  operation  commands.  Only  one  record  type  per  command 
can  be  referenced  by  IDMS. 

Support  does  not  appear  to  be  available  for  a high  or  low  data  search 
capability.  No  support  is  provided  for  Boolean  search  or  multiple  record  type 
search  requirements. 

c.  End  User  Tools 


The  end  user  language  supported  by  the  IBM  version  of  IDMS  is  CULPRIT. 
This  is  used  in  a batch  environment  for  retrieve  operations  only  against  an 
IDMS  data  base  and  provides  a high  level  query  language  support  capability. 

A Bill  of  Material  application  program  is  available  for  execution  with  the  IBM 
version  of  IDMS.  Similarly,  MMS  General  Ledger  - an  independent  software 
house  general  ledger  program  - does  execute  under  control  of  IDMS. 
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8. 


COST/PERFORMANCE 


a.  Measurable  Costs 


The  IBM  version  of  IDMS  costs  approximately  $2,375  per  month  ($48,000 
purchase).  Other  options  are  as  follows  (approximately): 


Per  Month 

Purchase 

Central  Version 
(extra  cost) 

S 850 

$17,000 

On-Line  Query 

$5,000 

$10,000 

Data  Dictionary 

S 675 

$13,500 

CULPRIT 

$1,150 

$22,500 

The  ICL  version  costs  are  comparable. 

b.  Real  Memory 

IDMS  requires  approximately  75K  bytes  of  real  storage  for  the  first  user 
(partition)  in  the  IBM  version.  Subsequent  partitions  require  an  additional  15- 
20K  bytes  for  the  OS  version  of  IDMS  and  an  additional  75K  bytes  for  the  DOS 
version.  The  ICL  version,  it  is  understood,  requires  I OK  words  -equivalent  to 
40K  bytes. 

c.  Performance  Constraints 

Earlier  versions  of  the  IBM  version  of  IDMS  provided  only  single  thread 
processing.  However,  it  is  understood  that  later  IBM  versions,  as  well  as  the 
ICL  version  support  a multi  thread  processing  capability. 
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The  exclusive  control  lockout  level  is  provided  at  the  block  (page)  level. 


o 


• 

• A common  input/output  buffer  pool  is  provided.  In  considering  performance 
control  through  data  grouping,  considerable  flexibility  is  offered  to  the  Data 
Base  Administrator  to  control  the  grouping  of  the  related  records,  with  VIA 
SET.  There  is  no  limit  to  the  number  of  repeating  group  types  (records)  per 
dataset  or  datasets  per  data  entity  (areas  per  data  base  record). 

• Support  is  provided  for  direct  relationships  through  LOCATION  MODE  IS 
DIRECT. 

• The  access  method  employed  is  a random  access  method  (DAM  on  the  IBM 
version). 
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IV  F.  IMS  - DL/I  (IBM) 


F. 


IMS  - DL/I  (IBM) 


GENERAL  DESCRIPTION 

Data  Language/ 1 (DL/I)  is  IBM's  prime  data  base  system.  It  supports 
System/360  and  System/370  Models  115  to  168  and  Models  3031,  3032  and 
3033.  The  number  of  users  throughout  the  world  exceeds  3,000. 

The  operating  systems  supported  are  DOS  and  OS  for  System/360  as  well  as 
DOS/VS,  OS/VS  I and  OS/VS2  for  System/370. 

IBM  offers  three  versions  of  DL/I  for  System/370  computers. 

DL/I  ENTRY,  which  provides  a subset  of  the  full  DL/I  facilities  and  is 
intended  for  System/370  Models  I 15-125. 

DL/I  DOS/VS,  a full  function  version  of  DL/I  intended  for  System/370 
Models  1 35-148. 

DL/I  OS/VS,  the  full  function  DL/I  support  provided  by  Information 
Management  System  (IMS),  which  is  intended  for  use  on  System/370 
Models  I 35-168. 

Each  of  the  three  IBM  DL/I  versions  will  be  discussed  together  in  relation  to 
each  of  the  evaluation  criteria  described  in  Section  III.  This  will  enable  a 
comparison  to  be  made  between  the  different  levels  of  function  offered  by 
each  DL/I  product  as  well  as  enabling  comparisons  to  be  made  between  DL/I 
and  other  DBMS  products. 

While  DL/I  is  primarily  a hierarchical  data  base  structure,  additional  facilities 
enable  DL/I  to  support  network  structures  and  inverted  structures. 
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The  design  of  a Personnel  Data  Base  System  is  used  in  this  description  to 
illustrate  the  various  capabilities  provided  by  IBM's  Data  Language/ 1 . 


Q 


The  data  base  needs  of  a Personnel  system  containing  a Skills  data  base 
and  a Payroll  data  base,  and  including  skills  information,  personnel 
details  and  salary  information,  is  representative  of  many  applications 
and  many  industry  requirements  and  can  be  used  to  represent  typical 
data  processing  needs  for  a DBMS  system. 

A description  of  the  information  in  the  Skills  data  base  follows.  Each  block  in 
Exhibit  IV-FI  represents  a group  of  related  fields.  These  are  described  as 
follows: 

SKILL:  Represents  an  entry  for  each  skill  category  (example: 

DENTIST)  in  the  data  base. 


NAME:  The  name  of  each  person  having  a particular  skill. 
EDUC:  The  education  a person  has  in  a particular  skill. 
EXPR:  Any  experience  a person  has  at  a particular  skill. 


ADDR:  The  person's  address. 

SALARY:  The  person's  payroll  information. 

• The  Payroll  data  base  contains  similar  information  to  the  Skills  data  base  but 
presents  it  in  a different  form  to  permit  easy  reference  to  name,  address  and 
salary  information.  Each  block  in  the  Payroll  data  base  shown  in  Exhibit  IV-F2 
represents  a group  of  related  fields.  These  blocks  contain  the  following 
information: 


NAME:  An  entry  for  each  person  on  the  payroll. 

o 
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EXHIBIT  IV- FI 


SKILLS  DATA  BASE  ORGANIZATION 
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EXHIBIT  IV-F2 


9 


PAYROLL  DATA  BASE  ORGANIZATION 
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ADDR:  The  person's  address. 


SALARY:  The  person's  payroll  information. 

SKILL:  An  entry  for  each  skill  possessed  by  a person. 

EDUC:  Education  related  to  the  particular  skill. 

EXPR:  Experience  related  to  the  particular  skill, 
a.  DL/I  Data  Base  Design 

• IBM's  Data  Language/ 1 (DL/I)  permits  the  Skills  data  base  and  the  Payroll 
data  base  to  be  implemented  as  separate  physical  data  bases. 

Each  physical  data  base  then  provides  a different  view  of  the  data  to 
meet  the  differing  requirements  - the  Skills  data  base  permitting  access 
to  the  information  based  on  skill  and  the  Payroll  data  base  permitting 
access  to  the  information  based  on  name. 

• DL/I  provides  a number  of  additional  facilities  that  reduce  the  amount  of  data 
redundancy  as  would  occur  if  each  of  the  above  data  bases  were  implemented 
as  physical  data  bases. 

DL/I  allows  the  name,  address  and  salary  information  related  to  a 
particular  person  to  be  separated  from  the  name,  education  and 
experience  information  related  to  a particular  skill. 

This  separation  can  be  made  as  two  distinctly  different  physical  Data 
Bases,  as  illustrated  in  Exhibit  IV-F3. 

Thus,  SKILL,  EDUC,  and  EXPR  appears  only  once  in  the  Skills  data  base 
and  ADDR  and  SALARY  information  appears  only  once  in  the  Payroll 
data  base. 
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EXHIBIT  IV-F3 


SEPARATE  SKILLS  AND  PAYROLL  DATA  BASES 


v ^ J V ^ J 

SKILLS  DATA  BASE  PAYROLL  DATA  BASE 
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However,  the  NAME  information  is  required  for  both  data  bases.  DL/I  enables 
that  NAME  information  to  reside  in  the  Payroll  data  base  and  be  referenced 
from  the  Skills  data  base  without  the  NAME  information  being  physically 
stored  in  the  Skills  data  base. 

This  cross-referencing  is  referred  to  by  DL/I  as  a logical  relationship 
and  is  shown  in  Exhibit  IV-F3  by  a double-headed  arrow  joining  the  two 
NAME  blocks. 

In  effect,  the  NAME  information  resides  in  the  Payroll  data  base  and 
that  NAME  information  is  pointed  to  from  the  Skills  data  base. 

The  pointer  in  the  Skills  data  base  is  generated  and  fully  maintained  by 
DL/I.  Thus,  data  redundancy  is  avoided  but  with  a slight  increase  in 
data  storage  requirements  as  required  by  DL/I  to  store  a direct  pointer 
in  the  Skills  data  base  to  the  NAME  information  in  the  Payroll  data 
base. 

While  this  information  is  implemented  as  two  separate  physical  DL/I  data 
bases,  the  three  level  mapping  provided  by  DL/I  allows  the  application 
programmer  to  view  the  data  base  in  either  of  the  two  forms:  as  a 

consolidated  Skills  data  base  or  as  a consolidated  Payroll  data  base. 

The  actual  physical  implementation  of  these  two  data  bases  is  trans- 
parent to  the  application  programmer.  The  application  programmer 
views  these  two  physical  data  bases  as  if  they  were  a single  physical 
data  base  with  the  view  shown  depending  on  whether  the  program 
requires  skills  information  or  payroll  information. 

The  advantage  which  DL/I  offers  in  data  base  design  is  the  ability  for  the  data 
base  designer  to  choose  that  design  best  suited  to  the  application  processing 
requirements  at  the  time  and  yet  be  able  to  change  that  data  base  design  as 
the  application  processing  needs  dictate. 
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This  change  can  generally  be  made  without  requiring  program  modifi- 
cation. The  application  program  always  views  the  particular  logical 
structure  of  the  data  base  and  the  particular  application  program's 
logical  structure  view. 


• Additionally,  DL/I  enables  secondary  indexes  to  be  established  such  that  while 
the  physical  data  bases  are  organized  in  SKILL  sequence  or  NAME  sequence 
respectively,  access  can  be  made  directly  on  the  basis  of  education  infor- 
mation or  salary  information  (for  example)  by  defining  secondary  indexes  on 
this  information.  This  is  shown  in  Exhibit  IV-F4. 


The  result  of  the  secondary  indexes  is  to  enable  the  application  program 
to  view  the  data  base  differently  as  if  it  was  physically  organized  as 
shown  in  the  two  logical  structures  in  Exhibits  IV-F5  and  IV-F6.  These 
illustrate  the  logical  structure  presented  based  on  an  education  secon- 
dary index,  as  well  as  the  logical  structure  presented  based  on  a salary 
secondary  index. 

• The  effect  of  these  secondary  indexes  is  to  allow  the  target  block  (segment)  to 
appear  as  if  it  was  the  root  (basic)  segment  (that  is  EDUC  or  SALARY),  with 
the  other  information  shown  as  dependent  information  on  the  appropriate  root 
segment. 


Thus,  the  EDUC  root  segment  permits  the  program  to  view  the  name 
and  skill  information  and  additionally  (through  the  logical  relationship 
to  the  Payroll  data  base)  view  the  address  and  salary  information 
related  to  that  name. 

Similarly,  with  the  SALARY  segment  now  appearing  logically  as  a root 
segment,  the  application  program  is  able  to  view  the  name,  skill, 
education  and  experience  segments  related  to  that  particular  person 
with  the  specific  salary,  by  means  of  the  logical  relationship  established 
between  the  Payroll  and  Skills  physical  data  bases. 
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EXHIBIT  IV-F4 


SKILLS  AND  PAYROLL 
DATA  BASES  WITH  SECONDARY  INDEXES 


SKILLS  DATA  BASE  PAYROLL  DATA  BASE 
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EXHIBIT  IV-F5 


LOGICAL  STRUCTURE  BASED  ON 


EDUCATION  SECONDARY  INDEX 
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EXHIBIT  IV-F6 


LOGICAL  STRUCTURE  BASED  ON  SALARY  SECONDARY  INDEX 
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The  use  of  logical  relationships  and  secondary  indexes  permits  an  application 
program  to  view  a data  base  in  many  different  ways. 

DL/I  determines  the  appropriate  data  bases  to  be  accessed  and  presents 
the  information  from  the  logical  structure  as  requested  by  the  applica- 
tion program. 

DL/I  follows  whatever  direct  pointers  have  been  defined  for  the  data 
base  to  establish  the  logical  relationships  and  secondary  indexes. 

This  is  done  transparent  to  the  application  program,  which  is  divorced 
from  all  considerations  regarding  the  physical  organization  of  the  data 
base. 


The  application  program  is  not  even  necessarily  aware  of  the  fact  that  there 
may  be  a secondary  index  established  (on  education,  for  example). 

However,  when  the  application  program  references  the  EDUC  segment 
(and  fields  within  the  education  segment)  as  part  of  a search  argument, 
DL/I  automatically  determines  that  it  can  access  the  education  infor- 
mation more  efficiently  be  applying  the  search  argument  to  the 
education  secondary  index  and  then  directly  accesses  those  education 
segments  which  satisfy  the  search  criteria. 


This  avoids  the  necessity  of  sequentially  searching  through  the  entire 
data  base  examining  each  education  segment  in  turn  to  determine 
whether  it  meets  the  search  criteria. 


b.  Data  Definition  Language 


The  following  is  a sample  of  the  DL/I  Data  Definition  Language  necessary  to 
define  the  two  physical  data  bases  and  the  two  logical  data  base  structures 
introduced  above. 
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• The  Physical  Skills  Data  Base  defined  in  Exhibit  IV-F7  has  been  specified  as 
Data  Base  name  (DBD  NAME)  SKILLSP,  which  physically  resides  on  a 3330 
drive  in  the  data  set  called  SKILLSDD. 

• The  DL/I  access  method  used  is  HDAM  (Hierarchical  Direct  Access  Method) 
for  efficient  random  access,  using  a randomizing  algorithm  called  RANDOM. 

• The  SKILL  segment  is  31  bytes  long  and  is  organized  in  sequence  of  the 
Skillcode  field  within  that  segment  (SKILCODE)  which  is  21  bytes  long.  The 
SKILL  segment  is  the  root  segment. 

• The  NAMEP  segment  is  defined  as  a dependent  segment  to  SKILL  (parent),  and 
also  a logical  relationship  is  established  to  the  NAME  segment  in  the 
PAYROLL  data  base. 

The  logical  relationship  pointers  established  are  LP  (Logical  Parent)  and 
LTB  (Logical  Twin  Backward). 

The  NAMEP  segment  is  organized  in  sequence  of  the  field  FULLNAME, 
which  is  20  bytes  long. 

• The  EDUC  segment  is  a dependent  segment  of  the  NAMEP  segment  defined 
earlier.  Fields  in  this  segment  are  GRADLEVL  (2  bytes)  and  SCHOOL  (20 
bytes). 


These  fields  can  be  referenced  in  search  arguments  to  enable  DL/I  to 
establish  search  criteria  on  the  content  of  these  fields  within  the  EDUC 
segment. 

Additionally,  a secondary  index  could  be  defined  with  the  EDUC 
segment  as  a target  segment.  The  secondary  index  contains  the 
GRADLEVL  and  SCHOOL  fields,  which  enable  DL/I  to  search  the 
secondary  index  based  upon  values  in  these  fields  (and  only  access  the 
EDUC  target  segment  for  those  segments  which  satisfy  the  criteria). 
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EXHIBIT  IV-F7 


DBD 

DATASET 

SEGM 

FIELD 

SEGM 

FIELD 

SEGM 

FIELD 

FIELD 

SEGM 

FIELD 

DBDGEN 

FINISH 

END 


SKILLS  DATA  BASE  - PHYSICAL  DEFINITION 


SKILLS  DATA  BASE  (PHYSICAL) 

NAME=SKILLSP,  ACCESS=HDAM,  RMNAME=  (RANDOM,  1, 500) 
DD1=SKILLSDD,  DEVICE=3330 
NAME=SKILL,  BYTES=31 

NAME=  (SKILCODE,  SEQ,  U)  , BYTES=21,  START=1 
NAME=NAMEP,  PARENT=  ( (SKILL,  SNGL)  , (NAME,  P, 
PAYROLL)  ) , BYTES=20,  PTR=  (LP,  LTB)  , RULES=  (VVV) 
NAME=  (FULLNAME,  SEQ,  U)  , BYTES=20  , START=1 
NAME=EDUC,  PARENT=  ( (NAMEP,  SNGL)  ) , BYTES=75 
NAME=GRADLEVL,  BYTES=2,  START=1 
NAME=SCHOOL,  BYTES=20,  START=3 
NAME=EXPR,  PARENT=(  (NAMEP,  DBLE)  ) , BYTES=20 
NAME=PREVJOB,  BYTES=10,  START=1 
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• The  EXPR  segment  is  also  defined  as  a dependent  segment  of  the  NAMEP 
segment  with  a field  defined  as  PREVJOB  (10  bytes)  that  can  be  used  for 
searching. 

• The  physical  Payroll  data  base  is  defined  in  Exhibit  IV-F8  as  Data  Base  Name 
(DBD  NAME)  PAYROLLP  and  resides  on  a 3330  drive  in  the  data  set  named 
PAYRLDD.  It  is  defined  to  DL/I  as  being  supported  by  HIDAM  (Hierarchical 
Index  Direct  Access  Method)  with  a maximum  block  size  of  4200  bytes. 

• The  root  segment  is  defined  as  the  NAME  segment,  is  150  bytes  long,  uses  twin 
backward  pointers  and  is  a parent  segment  to  two  logical  child  segments 
(LCHILD). 

One  LCHILD  segment  is  defined  as  the  INDEX  segment  in  the 
INDEXDB  Data  Base  (which  is  an  index  data  base  required  by  the 
HIDAM  DL/I  data  base  organization). 

A second  LCHILD  segment  is  defined  for  the  NAMEP  segment  in  the 
SKILLSP  physical  data  base  as  described  above.  This  establishes  the 
logical  relationship  between  the  Skills  data  base  and  the  Payroll  data 
base. 

• The  NAME  root  segment  additionally  has  a field,  FULLNAM,  which  is  used  as 
the  sequence  field  for  the  NAME  root  segment  and  is  20  bytes  long. 

Dependent  segments  defined  with  the  NAME  root  segment  as  parent  are 
the  ADDR  (200  bytes)  and  SALARY  (100  bytes)  segments. 

Both  of  these  segments  are  defined  such  that  new  segments  added  to 
the  data  base  under  a particular  NAME  are  placed  in  front  of  (FIRST) 
existing  segments. 
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EXHIBIT  IV-F8 


a 


DBD 

DATASET 

SEGM 

LCHILD 

LCHILD 

FIELD 

SEGM 

SEGM 

DBDGEN 

FINISH 

END 


PAYROLL  DATA  BASE  - PHYSICAL  DEFINITION 


PAYROLL  DATA  BASE  (PHYSICAL) 

NAME=PAYROLLP,  ACCESS=HIDAM 
DD1=PAYRLDD,  DEVICE=3330,  BLOCK=4200 
NAME=NAME,  BYTES=150,  PTR=TB,  RULES=  (VPV) 

NAME=  (INDEX,  INDEXDB)  , PTR=INDX 
NAME=(NAMEP  , SKILLSP)  , PTR=DBLE 
NAME=(FULLNAM,  SEQ,  U)  , BYTES=20,  START=1 
NAME=ADDR,  PARENT=NAME,  BYTES=200,  RULES=  ( , FIRST) 
NAME=SALARY,  PARENT=NAME,  BYTES=100,  RULES=  ( , FIRST) 


- 206  - 


© 1978  by  INPUT,  Menlo  Park,  CA  94025.  Reproduction  Prohibited. 


INPUT 


Thus,  the  address  and  salary  information  can  be  organized  on  a 
historical  basis  with  the  most  recent  address  and  salary  organized  first 
and  earlier  address  and  salary  segments  following. 

• The  physical  Payroll  Index  data  base  (Exhibit  IV-F9)  is  defined  to  meet  the 
HIDAM  organization  requirements  for  the  PAYROLLP  Payroll  physical  data 
base  defined  earlier.  It  is  defined  as  an  INDEX  data  base  (ACCESS)  which 
resides  on  a 3330  drive  in  the  data  set  named  INDXDDI . 

This  Index  data  base  comprises  one  root  segment  named  INDEX  (20 
bytes  long)  and  contains  a logical  child  relationship  to  the  NAME  root 
segment  in  the  PAYROLLP  HIDAM  Payroll  data  base. 

The  index  is  organized  on  the  FULLNAM  field  in  the  NAME  root 
segment  in  the  PAYROLL  data  base. 

The  field  INDXSEQ  is  defined  as  a sequence  field  of  25  bytes  within  the 
root  segment. 

• The  Skills  data  base  is  defined  in  Exhibit  IV-FI0  as  a logical  data  base 
consolidating  the  logical  relationship  established  between  the  SKILLSP  physi- 
cal data  base  and  the  PAYROLLP  physical  data  base. 

This  logical  data  base  is  referred  to  as  SKILLS  data  base  and  is  defined 
as  a LOGICAL  access  method  and  a LOGICAL  data  set. 

• Each  segment  within  the  logical  data  base  is  defined  with  reference  to  the 
source  of  that  logical  segment  in  its  appropriate  physical  data  base. 

Thus,  the  SKILL  root  segment  is  defined  as  having  its  source  as  the 
SKILL  root  segment  in  the  SKILLSP  physical  data  base. 
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EXHIBIT  IV-F9 


PHYSICAL  PAYROLL  INDEX  DATA  BASE  DEFINITION 


PAYROLL  INDEX  (PHYSICAL) 

DBD 

NAME=INDEXDB  , ACCESS=INDEX 

DATASET 

DD1=INDXDD1  , DEVICE=3330  , OVFLW=INDXDD2 

SEGM 

NAME=INDEX  , BYTES=20 

LCHILD 

NAME=(NAME  , PAYROLLP)  , INDEX=FULLNAM 

FIELD 

NAME=(INDXSEQ  , SEQ  , U)  , BYTES=20  , START=1 

DBDGEN 

FINISH 

END 
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EXHIBIT  IV-F10 


SKILLS  DATA  BASE  - LOGICAL  DEFINITION 


SKILLS  DATA  BASE  (LOGICAL) 

DBD 

NAME=SKILLS,  ACCESS=LOGICAL 

DATASET 

DD1  = LOGICAL 

SEGM 

NAME=SKILL,  SOURCE=  ( (SKILL,  SKILLSP)  ) 

SEGM 

NAME=NAME,  PARENT=SKILL, 

SOURCE=  ( (NAMEP,  KEY,  SKILLSP)  , (NAME,  DATA,  PAYROLLP)  ) 

SEGM 

NAME=EDUC,  PARENT=NAME,  SOURCE=  ( (EDUC, , SKILLSP)  ) 

SEGM 

NAME=EXPR,  PARENT=NAME,  SOURCE=  ( (EXPR, , SKILLSP)  ) 

SEGM 

NAME=ADDR,  PARENT=NAME,  SOURCE=  ( (ADDR, , PAYROLLP)  ) 

SEGM 

NAME=SALARY,  PARENT=NAME,  SOURCE  = ( (SALARY,  , PAYROLLP)  ) 

DBDGEN 

FINISH 

END 
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The  NAME  segment  is  defined  as  being  dependent  on  the  SKILL  parent 
segment  and  has  its  source  as  the  NAMEP  segment  in  the  SKILLSP  data 
base. 

This  is  a logical  relationship  to  the  data  for  the  NAME  segment  in  the 
PAYROLLP  physical  data  base. 

The  EDUC  and  EXPR  logical  segments  are  defined  as  dependents  of  the 
NAME  root  segment  (PARENT),  with  the  source  of  these  logical 
segments  being  the  EDUC  and  EXPR  segments  in  the  SKILLSP  physical 
data  base. 

Similarly,  the  ADDR  and  SALARY  logical  segments  are  dependents  of 
the  NAME  logical  segment  (PARENT),  and  the  source  of  their  data  is 
the  ADDR  and  SALARY  segments  in  the  PAYROLLP  physical  data 
base. 

It  is  through  this  logical  data  base  that  the  application  program  can  obtain 
access  to  all  or  part  of  the  segments  in  the  two  physical  data  bases. 

The  PAYROLL  Logical  Data  Base  is  similarly  defined  in  Exhibit  IV-FI  I with 
the  data  base  name  PAYROLL  as  a LOGICAL  access  method  and  LOGICAL 
data  set.  Similarly  to  the  SKILLS  Logical  data  base,  each  of  the  logical 
segments  within  the  PAYROLL  logical  data  base  are  defined  in  the  same  way. 

c.  Physical  Data  Base  Organization 

The  physical  layout  of  each  of  the  segments  of  the  SKILLS  and  PAYROLL 
physical  data  bases  with  their  established  logical  relationship  on  NAME  is 
shown  in  Exhibit  IV-FI 2.  This  indicates  the  DL/I  pointer  and  prefix  informa- 
tion to  establish  the  relationships  and  retrieval  based  on  the  Data  Definition 
Language  statements  described  above. 
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EXHIBIT  IV-F11 


c 

PAYROLL  DATA  BASE  - LOGICAL  DEFINITION 


PAYROLL  DATA  BASE  (LOGICAL) 

DBD 

NAME=PAYROLL,  ACCESS=LOGICAL 

DATASET 

DD1=LOGICAL 

SEGM 

NAME=NAME,  SOURCE=(  (NAME, , PAYROLLP)  ) 

SEGM 

NAME=ADDR,  PARENT=NAME,  SOURCE=(  (ADDR, , PAYROLLP)  ) 

SEGM 

NAME=SALARY,  PARENT=NAME,  SOURCE=(  (SALARY, , PAYROLLP)  ) 

SEGM 

NAME=SKILL,  PARENT=NAME, 

SOURCE=  ( (NAMEP,  KEY,  SKILLSP)  , (SKILL, , SKILLSP)  ) 

SEGM 

NAME=EDUC,  PARENT=SKILL,  SOURCE=  ( (EDUC, , SKILLSP)  ) 

SEGM 

NAME=EXPR,  PARENT=SKILL,  SOURCE=  ( (EXPR, , SKILLSP)  ) 

DBDGEN 

FINISH 

END 

c 
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EXHIBIT  IV-F12 


o 


SKILL 


NAMEP 


EDUC 


EXPR 


NAME 


ADDR 


SALARY 


PHYSICAL  SEGMENT  LAYOUT 


PREFIX 

SKILL 

KEY 

REST  OF 
DATA 

10 

21 

10 

PREFIX 

NAME  KEY 

38 

20 

PREFIX 

EDUC  DATA 

6 

75 

PREFIX 

EXPR  DATA 

6 

20 

PREFIX 

NAME  KEY 

REST  OF 

26 

20 

130 

PREFIX 


6 

PREFIX 


ADDRESS  DATA 


200 

SALARY  DATA 
100 
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• The  prefix  information  which  DL/I  includes  in  each  segment  permits  reference 
through  pointers  to  dependent  and/or  parent  segments  such  that  (based  on 
which  pointer  options  the  Data  Base  Administrator  selects)  DL/I  can  readily 
locate  related  segments. 

These  pointers  may  refer  to  related  segments  within  the  same  logical 
record  or  physical  record  without  requiring  an  intervening  1/0  access. 

Alternatively,  related  segments  may  reside  in  other  physical  records 
(blocks)  on  the  same  track  or  same  disk  cylinder  or  may  be  in  different 
data  sets  on  the  same  or  different  disk  drives  based  upon  the  data  base 
organization  selected  by  the  Data  Base  Administrator. 

The  particular  pointers  specified  enable  the  Data  Base  Administrator  to 
choose  those  options  which  best  suit  application  processing  needs 
without  impacting  application  programs.  The  application  program  is 
unaware  (and  in  fact  is  not  able  to  access  any)  of  the  prefix  or  pointer 
information  and  therefore  is  unaffected  by  changes  in  that  information. 

• The  logical  relationship  between  the  NAMEP  segment  in  the  SKILLSP  physical 
data  base  and  the  NAME  segment  in  the  PAYROLLP  physical  data  base  has 
been  established  in  this  example  on  the  NAME  as  a key. 

This  provides  a symbolic  logical  relationship. 

DL/I  enables  logical  relationships  to  be  specified  as  symbolic  or  direct. 

A direct  logical  relationship  uses  a direct  pointer  rather  than  a 
symbolic  key  with  consequent  reduction  in  the  amount  of  redundant  key 
information  carried  in  logical  related  segments. 
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As  was  indicated  above  with  the  two  physical  data  bases  (SKILLSP  and 
PAYROLLP),  these  separate  physical  data  bases  have  been  defined  in  this 
example  as  residing  in  different  data  sets  that  may  be  on  the  same  or  different 
3330  disk  drives. 


Within  each  data  set,  depending  upon  the  DL/I  data  base  access  method 
selected,  all  of  the  related  segments  for  a particular  SKILL,  or  a 
particular  NAME,  can  reside  within  the  same  logical  record  and  be 
retrieved  in  one  physical  I/O  access. 


Access  to  all  of  the  information  across  the  two  physical  data  bases  will 
naturally  require  at  least  two  physical  I/O  accesses. 


Additionally,  the  specification  of  the  PAYROLLP  physical  data  base  as 
a HIDAM  data  base  will  first  require  an  automatic  access  by  DL/I  to  its 
index,  followed  by  an  access  by  DL/I  to  the  PAYROLLP  data  base. 

Alternatively,  the  NAME  key  could  be  randomized  to  a specific  direct 
location  in  the  data  base  and  the  data  base  may  be  defined  as  HDAM 
for  random  access  resulting  in  only  one  I/O  access. 


If  the  number  of  I/O  accesses  necessary  to  retrieve  all  related  information  are 
to  be  reduced,  the  Data  Base  Administrator  may  optionally  combine  both 
physical  data  bases  into  one  physical  data  base  with  a structure  as  illustrated 
by  the  SKILLS  logical  data  base  or  alternatively  by  the  PAYROLL  logical  data 
base.  This  has  the  advantage  of  allowing  all  related  information  to  reside  in 
the  same  logical  record. 


The  example  in  Exhibit  IV-FI3  defines  Segment  Search  Arguments  for  SKILL, 
selecting  SKILLCODE  = CARPENTER  and  also  defines  NAME  and  SALARY 
segments. 


The  application  program  calls  the  CBLTDLI  (COBOL  TO  DL/I)  interface 
routine,  passing  parameters  which  specify  a Get  Unique  (GU)  direct  (random) 
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EXHIBIT  IV-F13 


EXAMPLE  OF  SEGMENT  SEARCH  ARGUMENT  DEFINITION 


WORKING-STORAGE  SECTION. 

01  SSA1  PICTURE  X(NN) 

VALUE  'SKILL  (SKILLCODE  = CARPENTER)' 

01  SSA2  PICTURE  X (NN) 

VALUE  'NAME  *D' 

01  SSA3  PICTURE  X (NN) 

VALUE  'SALARY' 

PROCEDURE  DIVISION. 

CALL  'CBLTDLI'  USING  'GU'f  PCB1,  IOAREA,  SSA1 
IF  STATUS-CODE1  NOT  EQUAL  ' ' GO  TO  DLIERROR. 
LOOP. 

CALL  'CBLTDLI'  USING  'GNP',  PCB1,  IOAREA,  SSA2, 
SSA3 

IF  STATUS-CODE1  EQUAL  'GE'  GO  TO  FINISH 
ELSE  PERFORM  PROCESS  GO  TO  LOOP. 
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retrieval  operation  using  a Program  Communication  Block  called  PCBI.  This 
PCB  defines  the  data  base  name  to  be  accessed  and  the  logical  structure  by 
which  the  application  program  will  view  that  data  base. 

With  the  physical  structure  of  the  data  bases  defined  as  discussed  previously, 
each  of  these  calls  will  not  necessarily  result  in  separate  I/O  accesses. 

There  will  be  an  I/O  access  to  retrieve  the  SKILL  root  segment  within 
SKILLCODE  = CARPENTER.  However,  on  retrieving  that  segment  into 
storage,  DL/I  has  access  to  the  EDUC  and  EXPR  segments  without  any 
further  access. 

As  a logical  relationship  was  established  between  the  Skills  data  base 
and  the  Payroll  data  base,  a second  I/O  access  is  issued  by  DL/I  to 
retrieve  the  NAME  and  SALARY  segments  that  reside  physically  in  the 
Payroll  data  base. 

As  can  be  seen  from  the  above  example,  the  necessary  I/O  accesses 
carried  out  by  DL/I  are  transparent  to  the  application  programmer  and 
the  application  programmer  is  not  even  aware  that  two  physical  data 
bases  and  a logical  relationship  are  involved  in  satisfying  the  program's 
request. 

The  advantage  which  DL/I  offers  in  data  base  design  is  flexibility. 

The  Data  Base  Administrator  is  not  constrained  in  reorganizing  and 
changing  the  data  base  design  to  meet  changed  application  require- 
ments because  of  the  impact  of  such  change  on  application  program 
modifications. 

As  can  be  seen  from  the  above  examples,  a significant  reorganization 
and  restructuring  of  the  data  base  can  be  achieved  completely  trans- 
parent to  the  application  program. 
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The  functions  provided  in  DL/I  to  achieve  this  flexibility,  while  still 
maintaining  simplicity  for  the  application  programmer,  does- of  course 
require  additional  program  logic  and  overhead  over  that  which  may  be 
necessary  for  other  DBMS  systems. 

However,  the  additional  storage  and  instruction  path  length  which  this 
represents  does  offer  the  installation  complete  flexibility  in  being  able 
to  adjust  to  changing  data  base  requirements. 

d.  Data  Manipulation  Language 

• DL/I  uses  Segment  Search  Arguments  (SSAs)  to  specify  the  selection  criteria 
for  required  segments. 

• Exhibit  IV-FI4  illustrates  the  way  in  which  DL/I  can  be  used  to  add  a record 
to  a data  base. 

SSAI  specifies  the  SKILL  segment  whose  SKILLCODE  field  has  the 
value  of  DENTIST. 

The  application  program,  having  set  up  the  new  record  to  be  added  in 
the  IOAREA,  issues  an  ISRT  (Insert)  CALL  specifying  through  SSAI  and 
SSA2  that  DL/I  is  to  add  the  new  record  in  the  correct  sequence  in  the 
data  base  depending  upon  the  value  of  DENTIST  as  a skill.  A blank 
return  code  indicates  no  error  occurred. 

• An  example  of  a deletion  of  an  existing  data  base  record  (also  of  an  update  of 
an  existing  data  base  record)  requires  first  that  the  application  program  issues 
a GET  HOLD  request  to  reserve  that  record  until  the  deletion  or  update 
(replace)  has  been  completed.  Exhibit  IV-FI5  illustrates  this  point. 

• SSAI  is  defined  as  described  previously.  SSA2  must  be  fully  defined  to  specify 
to  NAME  segment  whose  FULLNAME  is  JONES. 


-217  - 


© 1978  by  INPUT,  Menlo  Park,  CA  94025.  Reproduction  Prohibited. 


INPUT 


EXHIBIT  IV-F14 


o 


ADDING  A RECORD  TO  A DL/1  DATA  BASE 


WORKING-STORAGE  SECTION. 

01  SSA1  PICTURE  X (NN) 

VALUE  'SKILL  (SKILLCODE  = DENTIST)  ' . 

01  SSA2  PICTURE  X (NN) 

VALUE  'NAME' 

PROCEDURE  DIVISION. 

— Assemble  record  in  IOAREA  — 

CALL  'CBLTDLI'  USING  'ISRT',  IOAREA,  SSA1,  SSA2 
IF  STATUS-CODE1  EQUAL  ' ' GO  TO  FINISH 
ELSE  GOTO  DLIERROR. 
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EXHIBIT  IV-F15 


DELETING  A RECORD  FROM  A DL/1  DATA  BASE 


WORKING-STORAGE  SECTION. 

01  SSA1  PICTURE  X (NN) 

VALUE  'SKILL  (SKILLCODE  = DENTIST)  ' . 

01  SSA2  PICTURE  X (NN) 

VALUE 'NAME  (FULLNAME  JONES)'. 

PROCEDURE  DIVISION. 

CALL  'CBLTDLI'  USING  'GHU\  PCB1,  fOAREA,  SSA1,  SSA2 
IF  STATUS-CODE  NOT  EQUAL  ' ' GO  TO  DLIERROR 
CALL  'CBLTDLI'  USING  'DLET' , PCB1 
IF  STATUS-CODE  NOT  EQUAL  ' ' GO  TO  DLIERROR. 
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The  application  program  then  issues  a GHU  (Get  Hold  Unique)  call  specifying 
SSAI  and  SSA2  to  retrieve  the  specified  segments  and  reserve  them  for 
subsequent  update. 

DL/I  retrieves  the  SKILL  and  NAME  segments  for  the  unique  person 
JONES  who  is  a DENTIST. 


Following  this  retrieval  and  reservation  of  the  segments,  the  program 
then  issues  a DLET  (Delete)  which  deletes  the  SKILL  and  NAME 
segment  information. 

DL/I  automatically  deletes  all  dependent  segments  in  both  data  bases 
and  the  NAME  logical  relationship  without  any  additional  application 
programming  necessary. 


The  specification  of  the  data  base  records  to  be  deleted  through 
identification  provided  by  a SKILLCODE  of  DENTIST  and  FULLNAME 
of  JONES  is  all  that  DL/I  requires  to  retrieve  the  related  records  in  the 
two  physical  data  bases  and  delete  all  related  information  and  depen- 
dent segments. 


Additionally,  DL/I  will  delete  the  reference  in  the  INDEX  data  base  to 
that  person  JONES. 


BASIC  FUNCTIONAL  CAPABILITIES 


a.  Easy  Accessibility 

All  three  DL/I  products  offer  support  of  Assembler  as  a machine-oriented 
programming  language  together  with  COBOL  and  PL/ 1 as  commercial  pro- 
gramming languages.  Additionally,  DL/I  ENTRY  supports  RPG  II  (in  the  batch 
environment  only). 
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• The  scientific  programming  language  supported  by  all  three  products  is  PL/I. 
Further,  as  a standard  CALL  structure  is  used,  FORTRAN  can  additionally  be 
used  through  a user-written  interface. 

• End  User  language  support  is  provided  for  commercial  uses  by  IBM's  General- 
ized Information  System.  GIS  DOS/VS  is  available  for  both  DL/I  ENTRY  and 
DL/I  DOS/VS. 

While  this  is  a batch  oriented  query  language,  it  can  be  used  for  on-line 
submission  of  queries  and  display  of  results  on  a terminal  through  the 
use  of  IBM's  Customer  Information  Control  System,  together  with  the 
CICS  Source  Program  Maintenance  II  Program. 

• In  the  OS/VS  environment,  GIS/VS  provides  not  only  the  query  language 
capability  of  GIS  DOS/VS,  but  also  the  ability  to  create  and  modify  data  bases 
through  an  end  user  language. 

• GIS/VS  can  be  used  on-line  through  IMS/VS.  (Additionally,  IMS/VS  offers  a 
limited  subset  of  the  GIS/AQF  capability  by  means  of  the  Interactive  Query 
Feature  IMS/VS.) 

• No  end  user  scientific  languages  are  standardly  supported  by  IBM's  DL/I 
products.  However,  through  APL/SV  (A  Programming  Language/Shared  Vari- 
ables), a user  written  interface  can  be  provided  to  access  DL/I  Data  Bases. 

• Data  Communication  Support  to  provide  access  to  DL/I  data  bases  is  provided 
by  two  products.  These  are  IBM's  Customer  Information  Control  System 
(CICS)  and  the  Information  Management  System  (IMS). 

• CICS  is  available  in  a DOS/VS  and  OS/VS  version,  and  each  version  provides  a 
multi-task  interface  (CICS-DL/I)  permitting  access  to  all  three  DL/I 
products. 
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I MS/VS  provides  an  integrated  Data  Communications  interface  (IMS/VS  DC)  to 
1MS/VS  data  bases,  which  offers  additional  levels  of  security  and  integrity  over 
that  provided  by  CICS-DL/I. 

b.  Multiple  Views  Of  Data 

All  three  DL/I  products  support  multiple  views  of  data,  permitting  both  batch 
and  on-line  programs  to  retrieve  data  sequentially,  randomly,  or  through  an 
index  providing  support  for  multiple  indices. 

c.  Data  Consolidation 


It  is  in  the  area  of  data  consolidation  that  some  differences  between  the  three 
DL/I  products  become  apparent. 


DL/I  ENTRY  supports  a maximum  of  63  repeating  group  types  per 
entity.  These  are  referred  to  by  IBM  as  segment  types  - thus,  a 
maximum  of  63  segments  types  are  supported  per  data  base  record. 


The  other  two  DL/I  products  support  a maximum  of  255  segment  types 
per  data  base  record. 


Within  each  segment  type  (repeating  group  type)  all  three  DL/I  products 
support  an  unlimited  number  of  segment  occurrences,  thus  supporting  unlim- 
ited variable  length  records. 

DL/I  ENTRY  supports  only  fixed  length  segments  while  the  other  products 
support  both  fixed  and  variable  length  segment  occurrences. 

All  three  products  support  a maximum  of  15  nested  levels.  The  net  result  is  to 
provide  virtually  unlimited  variable  length  support.  The  limit  of  15  levels  and 
255  segment  types  per  data  base  record  far  exceeds  the  maximum  require- 
ments of  all  but  the  most  complex  data  bases. 
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Typical  data  bases  involve  generally  no  more  than  20  to  30  segment 
types  and  to  a depth  generally  of  no  more  than  5 to  8 nested  levels. 

• The  entity  relationships  supported  by  DL/I  refers  to  the  total  number  of  data 
base  records.  No  limit  is  placed  by  any  of  the  data  base  products  on  the  total 
number  of  records  in  a data  base. 

• Each  segment  can  have  a relationship  established  up  to  the  maximum  number 
of  segment  types  supported,  less  one  representing  itself,  in  the  data  base. 

• The  total  number  of  segment  relationships  supported  per  program  is  20  for 
DL/I  ENTRY  and  32  for  DL/I  OS/VS,  expandable  to  64.  The  maximum 
segment  relationships  supported  by  DL/I  DOS/VS  is  not  known. 

3.  DATA  INDEPENDENCE 
a.  Levels  of  Mapping 

• Each  DL/I  product  offers  a high  degree  of  data  independence,  supporting  three 
levels  of  mapping. 

The  Internal  level  of  mapping  is  referred  to  as  the  Data  Base  Descrip- 
tion (DBD)  and  defines  the  physical  data  base  organization. 

The  External  map  is  referred  to  as  the  Program  Specification  Block 
(PSB)  and  defines  that  subset  of  the  full  data  base  record  that  the 
application  program  is  permitted  to  view. 

The  Conceptual  level  enables  the  physical  structure  of  the  data  base  to 
be  transformed  into  a logical  structure.  This  logical  view  may 
represent  a substantially  different  logical  structure  to  that  physically 
recorded  on  the  data  base. 
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In  this  way,  provided  that  logical  view  is  maintained,  the  data  base  may 
be  significantly  reorganized  to  meet  changed  application  requirements 
without  requiring  much,  or  any,  program  modification. 


While  each  DL/I  product  allows  the  segment  to  be  defined  as  comprising  one 
or  several  fields  and  DL/I  permits  each  field  in  a segment  to  be  named  in  the 
Data  Base  Description  (DBD),  these  field  definitions  are  used  solely  for 
searching  and  identifying  data  base  records  based  upon  field  values.  DL/I  does 
not  otherwise  provide  field  level  data  independence. 


b.  Data  Base  Changes 

The  Data  Independence  capability  of  DL/I  can  be  assessed  by  considering  each 
of  the  following  changes  in  turn  and  its  impact  on  the  need  to  reload  the  data 
base,  recompile  programs  or  change  program  logic. 


Change  Device  Type:  For  each  of  the  DL/I  products,  a change  in 

device  type  is  accomplished  by  regenerating  the  Data  Base  Description 
(DBD),  specifying  the  new  device  type  to  be  used.  While  this  involves 
reloading  the  data  base,  the  program  logic  does  not  have  to  be  changed 
and  the  programs  need  not  be  recompiled.  DL/I  determines  the  device 
type  from  the  DBS  and  accesses  the  data  base  on  the  different  device 
type  without  any  program  modification. 


Change  Access  Method:  In  the  same  way,  the  access  method  may  be 

changed  by  specifying  a new  access  method  in  the  DBD  and  regener- 
ating it.  This  should  be  considered  in  two  areas,  the  operating  system 
access  method  and  the  DL / 1 access  methods  used. 


Change  Entity  View:  By  regenerating  the  DBD,  the  physical  structure 
of  the  data  base  may  be  changed  to  reflect  a difference  in  application 
processing  requirements.  As  DL/I  provides  a Conceptual  map  (the 
logical  structure  of  the  data  base  as  viewed  by  the  program),  providing 
that  logical  structure  does  not  change,  no  change  is  necessary  in  the 
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application  program  logic  and  the  programs  need  not  be  recompiled.  Of 
course,  the  data  base  would  need  to  be  reloaded  to  reflect  that 
different  data  base  physical  structure. 

Add  New  Entity:  The  addition  of  a new  record  to  the  data  base  requires 
no  application  program  logic  changes  nor  program  recompilation,  and 
the  data  base  does  not  need  to  be  reloaded.  This  addition  can  be 
accomplished  dynamically  in  either  a batch  or  on-line  environment. 

Add  New  Repeating  Group  Type:  The  addition  of  a new  segment  type 
to  the  data  base  can  be  transparent  to  existing  application  programs. 
These  application  programs  continue  to  view  only  those  segments  of  the 
data  base  that  they  need  to  process  and  are  not  aware  of  the  new 
segment  type  unless  they  also  need  to  process  this  segment.  In  this 
case,  the  only  change  necessary  is  to  reload  the  data  base  to  incorpor- 
ate the  new  segment  type.  No  program  recompilation  or  logic  change  is 
necessary. 

Add  New  Relationship;  Relationships  can  be  established  between 
segment  types  in  a data  base  or  between  segment  types  in  different 
data  bases  in  a way  that  is  transparent  to  existing  application  programs. 
Provided  these  application  programs  do  not  need  to  be  aware  of  the  new 
relationship  for  application  reasons,  DL/I  only  requires  reloading  of  the 
data  base  to  incorporate  the  new  relationship  and  does  not  require  a 
change  in  program  logic  or  program  recompilation. 

Add  New  Field  To  Repeating  Group;  The  addition  of  a new  field  to  an 
existing  segment  type,  while  requiring  a reload  of  the  data  base  to  add 
that  new  field  in  general,  will  also  require  recompilation  of  the 
programs  which  use  that  segment  type.  As  the  segment  is  the  smallest 
element  of  data  retrieved  by  DL/I,  the  addition  of  a field  to  a segment 
is  analogous  to  adding  a field  to  a record  using  traditional  data 
management. 
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Change  Field  Format;  A change  in  field  format  within  a segment  will 
have  the  same  impact  to  an  application  program  using  DL/I  as  a change 
of  record  format  using  traditional  data  management  technigues.  Provi- 
ded fields  are  referred  to  in  the  program  symbolically  (in  the  case  of 
Assembler  language  problems),  the  program  logic  change  may  be 
minimal  unless  there  has  been  a significant  change  in  the  internal  field 
format  (character  to  binary  for  example).  This  change  will  certainly 
reguire  program  recompilation  as  well  as  reloading  the  data  base  for  all 
three  DL/I  products. 

DATA  INTEGRITY 

a.  Exclusive  Control 


Each  DL/I  product  provides  ENQ/DEQ  logic  to  ensure  that  only  one  user  is 
able  to  update  part  of  the  data  base  simultaneously. 

Consider  programs  executing  in  two  or  more  operating  system  partitions, 
which  each  attempt  to  update  the  data  base  simultaneously.  1MS/VS  and  DL/I 
DOS/VS  provide  Exclusive  Control  at  the  repeating  group  occurrence  level 
(that  is  the  segment  occurrence  level). 

The  segment  occurrence  is  also  the  lowest  isolated  level  and  will  only 
cause  another  user  to  wait  if  it  attempts  to  update  that  same  segment 
occurrence. 

On  the  other  hand,  DL/I  ENTRY  engueues  at  the  dataset  level  within  the  data 
base.  While  a data  base  may  be  made  up  of  several  datasets  (files),  to  ensure 
no  attempt  is  made  to  simultaneously  update  the  data  base  from  more  than 
one  partition,  DL/I  ENTRY  allows  only  one  partition  to  carry  out  updates 
against  various  datasets  in  a data  base.  Other  partitions  can  only  read  the 
records  in  that  dataset  but  may  not  also  update  them. 


- 226  - 


© 1978  by  INPUT,  Menlo  Park,  CA  94025.  Reproduction  Prohibited. 


INPUT 


• Where  two  or  more  users  within  the  same  partition  attempt  to  simultaneously 
update  the  same  data  base,  DL/I  DOS/VS  ENQs  on  the  segment  occurrence. 
(This  is  also  the  lowest  isolated  level  except  in  the  case  of  DL/I  ENTRY  where 
the  dataset  is  the  lowest  isolated  level).  Thus,  provided  two  users  do  not 
simultaneously  attempt  to  update  the  same  segment  occurrence  in  a data  base, 
they  are  able  to  process  simultaneously.  Otherwise,  one  user  will  wait  until 
the  other  user  has  finished  its  update  of  the  particular  segment  occurrence. 

• The  Exclusive  Control  approach  adopted  by  DL/I  ENTRY  prevents  the 
possibility  of  any  deadlock  occurring.  However,  some  sacrifice  in  performance 
capability  has  been  made  to  prevent  the  occurrence  of  deadlocks. 

In  the  case  of  IMS/VS  and  DL/I  DOS/VS,  a deadlock  is  only  possible 
where  two  or  more  concurrently  executing  users  attempt  to  update  two 
or  more  segment  occurrences  but  not  in  the  same  sequence. 

This  deadlock  situation  is  detected  by  the  IMS/VS  Data  Communications 
feature,  and  by  DL/I  DOS/VS,  and  is  automatically  resolved  by  ab- 
normally terminating  one  of  the  deadlocked  users  and  backing  out  its 
processing  completely  to  a previously  defined  synchronization  point. 

The  first  user  is  then  permitted  to  complete  processing  after  which  the 
terminated  user  is  automatically  rescheduled  (in  the  case  of  IMS/VS)  to 
reprocess  and  complete  its  activity. 

Thus,  while  deadlock  is  possible,  IMS/VS  and  DL/I  DOS/VS  detect  and 
correct  this  situation.  The  result  is  full  data  integrity  together  with 
the  best  possible  performance  capability. 

• This  facility  is  referred  to  as  Program  Isolation  and  its  use  is  transparent  to 
the  terminal  user. 

• Each  of  the  three  DL/I  products  accept  responsibility  for  establishing  Exclu- 
sive Control  and  maintaining  isolation  from  concurrent  updates. 
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DL/I  ENTRY  avoids  the  possibility  of  deadlocks  by  the  Exclusive  Control 
technique  it  adopts.  IMS/VS  and  DL/I  DOS/VS  detect  and  resolve  deadlocks 
through  the  use  of  Program  Isolation. 

RECOVERY/RESTART 

a.  Recovery 


Both  full  function  DL/I  products  accept  responsibility  for  logging  after-images 
of  data  base  records  to  permit  system  recovery  in  the  event  of  an  unrecover- 
able I/O  error.  DL/I  ENTRY  in  the  batch  environment  does  not  log  any  after- 
images but  uses  a data  base  copy  capability  with  reprocessing  to  reconstruct 
from  an  I/O  error. 


In  the  on-line  environment,  DL/I  ENTRY  uses  CICS/DOS/VS-provided 
support  to  log  after-images.  These  after-images  can  subsequently  be 
utilized  by  a separately  available  CICS  Forward  Recovery  program  to 
reconstruct  the  data  base  following  an  unrecoverable  I/O  error. 


Each  DL/I  product  provides  Copy/Restore  utilities  to  permit  a data  base  dump 
to  be  taken  periodically  for  use  as  backup  in  the  event  of  possible  future 
damage  to  the  data  base. 

DL/I  products  other  than  DL/I  ENTRY  provide  additional  utility  support 
permitting  the  system  log  activity  to  retain  a summary  log  containing  the 
latest  version  of  each  updated  segment  in  the  data  base. 

This  summary  log  is  sorted  into  the  same  sequence  as  the  data  base  and 
is  used  to  update  the  backup  copy  of  the  data  base. 

A DL/I -provided  utility  permits  users  to  recover  a physically  damaged  data 
base  very  efficiently  without  requiring  any  additional  user  programming. 
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• The  smallest  recoverable  unit  for  DL/I  ENTRY  is  an  entire  data  base.  The 
smallest  recoverable  unit  for  DL/I  DOS/VS  is  a dataset  within  an  entire  data 
base  while  IMS/VS  permits  recovery  to  be  carried  out  only  on  the  particular 
track  which  was  damaged  in  the  data  base. 

b.  Batch  Restart 

• DL/I  ENTRY  (in  the  batch  environment)  does  not  provide  support  for  logging 
before-images  to  be  used  in  restarting  a program  which  has  abnormally 
terminated.  The  other  two  DL/I  products  accept  full  responsibility  for  logging 
before-images  in  the  event  of  batch  program  failure. 

• Utility  support  provided  by  IMS/VS  permits  the  log  tape  to  be  terminated  in 
the  event  of  a system  failure  (such  as  power  failure).  This  is  done  by 
extracting  from  a storage  dump  the  updated  data  base  records  in  the  DL/I 
buffer  pool  which  have  not  been  written  back  to  the  data  base  at  the  time  of 
failure. 

• System/370  has  an  optional  power  warning  feature  which  forces  an  immediate 
hardware  dump  of  storage  to  disk  in  the  event  of  power  failure.  The  IMS/VS 
Log  Terminator  utility  uses  this  dump  to  ensure  normal  closing  of  the  log  tape. 

• DL/I  DOS/VS  does  not  provide  a log  tape  termination  capability.  In  the  event 
of  a power  failure,  updated  data  base  records  in  the  buffer  pool  are  lost. 

This  does  not  represent  any  loss  of  data  integrity  but  may  require  some 
additional  backout  activity,  since  backout  occurs  based  only  on  the 
physical  contents  of  the  log  tape. 
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To  minimize  the  impact  of  lost  data  base  records  in  the  buffer  pool  on  a 
system  failure,  DL/I  DOS/VS  uses  a log  write-ahead  facility  such  that 
all  data  base  updates  in  the  buffer  pool  are  first  recorded  on  the  system 
log,  even  though  the  data  base  record  itself  may  not  be  written  back  to 
the  data  base  until  some  later  time  (and  may  perhaps  even  be  lost  in  the 
event  of  a system  failure). 

This  means  that  the  system  log  reflects  all  data  base  activity  up  to  the 
time  of  failure  and  permits  data  integrity  to  be  preserved  fully. 

DL/I  ENTRY,  because  it  provides  no  logging  of  before-images,  does  not 

provide  any  batch  backout  utility  support. 


IMS/VS  and  DL/I  DOS/VS  permit  a batch  program  to  specify  intermediate 
restart  points  through  the  use  of  a checkpoint  CALL.  These  permit  a batch 
program  to  restart  processing  at  some  logical  point  during  program  processing, 
without  requiring  full  backout  to  the  start  of  the  program.  DL/I  ENTRY  does 
not  provide  intermediate  restart  checkpoints. 


c.  On-Line  Restart 


On-line  restart  is  provided  by  the  IMS/VS  DC  feature,  which  accepts  responsi- 
bility for  logging  all  input  and  output  messages  and  directs  this  log  activity  to 
the  same  log  tape  as  used  by  the  data  base. 

Also,  in  the  OS/VS  environment,  CICS/OS/VS  can  be  used  to  access  DL/I  data 
bases.  CICS/DOS/VS  is  used  to  access  DL/I  DOS/VS  and  DL/I  ENTRY  data 
bases  in  the  DOS/VS  environment. 

The  facilities  provided  by  CICS  (both  DOS/VS  and  OS/VS)  are  utilized  by  DL/I 
for  on-line  restart. 


CICS/DOS/VS  accepts  responsibility  for  logging  of  input  and  output 
messages  but  only  for  those  terminals  supported  by  the  Virtual  Tele- 
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communication  Access  Method  (VTAM)  or  the  Extended  Telecommuni- 
cations Module  (EXTM)  feature  of  CICS/DOS/VS. 

VTAM  and  EXTM  are  used  to  support  IBM's  Systems  Network  Architec- 
ture (SNA)  communications  philosophy  supporting  communication  with 
programmable  "minicomputers." 

Both  CICS  and  the  programmable  minicomputer  participate  in  message 
recovery  and  resynchronization  on  emergency  restart  following  a sys- 
tem failure. 

CICS  permits  on-line  DL/I  data  base  activity  to  be  directed  to  the 
same  log  as  used  by  the  CICS/VS  system. 

• Log  synchronization  for  both  on-line  and  data  base  activity  is  provided  by  the 
CICS  and  IMS/VS  DC  systems.  The  user  program  can  also  participate  in  log 
synchronization  specifying  intermediate  restart  (synchronization)  points  during 
the  processing  of  an  on-line  transaction. 

During  emergency  restart,  CICS  and  IMS  will  only  backout  incompletely 
processed  transactions  to  the  most  recent  intermediate  synchronization 
point. 

• Both  CICS  and  IMS/VS  provide  a dynamic  backout  capability  for  task  restart  in 
the  event  of  a program  failure. 

This  is  provided  through  the  Program  Isolation  feature  of  IMS/VS,  which 
automatically  reschedules  and  restarts  the  task  after  dynamically 
backing  out  the  processing  of  the  abnormally  terminated  transaction. 

CICS/VS  requires  the  terminal  operator  to  resubmit  a transaction  that 
CICS  has  dynamically  backed  out  for  reprocessing. 
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CICS/VS  and  IMS/VS  DC  provide  the  ability  for  system  restart  - both  warm 
start  and  emergency  restart  - re-establishing  the  on-line  system  and  DL/I  data 
bases  to  some  prior  status. 


o 


Messages  are  not  automatically  reprocessed  by  any  of  the  on-line 
systems,  but  an  option  is  given  to  the  user  to  reprocess  messages  that 
had  been  logged  if  the  application  demands  it. 

Messages  whose  processing  was  incomplete  at  the  time  of  failure,  and 
which  have  been  backed  out  during  emergency  restart,  may  not  need  to 
be  re-entered  but  are  made  available  by  IMS/VS  or  CICS/VS  for 
reprocessing. 

The  terminal  operator  is  not  required  to  re-enter  the  original  transac- 
tion, particularly  in  a VTAM  or  EXTM  enviornment. 


When  messages  are  reprocessed,  they  are  processed  in  the  original 
sequence. 


6.  DATA  SECURITY 

• The  Program  Specification  Block  (PSB)  is  the  mechanism  used  by  all  DL/I 
products  to  restrict  unauthorized  access  to  the  data  base. 

The  Data  Base  Administrator  defines  only  that  part  of  the  data  base 
that  the  application  program  is  authorized  to  operate  on,  to  the  level  of 
a repeating  group  type  (i.e.,  a segment  type). 
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The  PSB  enables  the  Data  Base  Administrator  to  ensure  that  the 
application  program  has  access  only  to  those  specific  segment  types  for 
which  it  is  authorized.  The  PSB  also  enables  him  to  control  the  level  of 
access  to  each  segment  type  as  being  read  only,  read  and  update  only, 
add,  delete  or  exclusive  use  access. 

With  exclusive  use  access,  no  other  user  is  able  even  to  read  the  same 
segment  type  concurrently  with  another  user  who  has  exclusive  use  of 
that  segment. 

• To  ensure  the  fullest  possible  data  base  security  and  control,  the  programmer 
is  not  involved  in  enforcing  this  level  of  security. 

It  is  completely  outside  his  control  and  is  separately  specified  by  the 
Data  Base  Administrator  who  separately  generates  the  PSB. 

The  application  program  can  only  access  that  part  of  the  data  base  that 
has  been  specified  for  him  in  the  PSB. 

• An  additional  level  of  security  is  offered  by  IMS/VS  DC  over  that  provided  by 
CICS/VS.  IMS/VS  DC  permits  not  only  the  above  level  of  data  security  to  be 
enforced  through  the  PSB  but  also  permits  security  to  be  enforced  through  the 
use  of  passwords  at  the  data  base,  transaction  and  physical  terminal  level. 

Thus,  a user  must  have  the  correct  password  to  be  able  to  utilize  a 
specific  terminal.  Additionally,  he  may  be  required  to  specify  a 
different  password  to  utilize  each  separate  transaction  (or  group  of 
transactions)  and  a different  password  to  access  specific  data  bases. 

• CICS/VS  provides  a level  of  security  almost  comparable  to  this  through  the  use 
of  a sign-on  security  mechanism. 

The  terminal  user  must  sign  on  with  a specific  name  and  password  and 
is  then  given  clearance  on  one  or  several  security  levels. 
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While  the  terminal  operator  can  use  any  physical  CICS/VS  terminal,  he 
is  only  permitted  to  enter  those  transactions  that  have  been  authorized 
for  the  security  levels  he  has  been  allocated  to. 


• With  either  CICS/VS  or  IMS/VS,  any  security  violation  notifies  the  Master 
Terminal  Operator  with  identification  as  to  the  terminal  operator,  physical 
terminal  and  type  of  operation  attempted. 


7.  EASE  OF  USE 

a.  Data  Base  Administrator 


• The  Data  Definition  Language  (DDL)  for  all  three  DL/I  products  is  a macro 
language.  This  is  used  to  define  the  Data  Base  Description  (DBD)  and  the 
Program  Specification  Block  (PSB)  using  macro  statements,  which  are  subse- 
quently assembled  into  the  DBD  and  PSB  for  execution. 

• Design  aids  are  provided  separately  for  the  DL/I  products  and  are  separately 
chargeable. 


Data  Base  Design  Aid  (DBDA):  This  program  can  be  used  to  design  a 
suggested  physical  and  logical  data  base  structure,  given  an  existing  file 
format  and  the  actual  file.  It  examines  the  file  and,  based  on  frequency 
of  occurrence  of  specific  fields,  it  recommends  how  these  fields  should 
be  grouped  in  segments  and  how  the  segments  should  be  organized  in  a 
data  base  structure. 


DBPROT QT YPE:  This  is  available  primarily  for  IMS/VS  and  permits  a 
prototype  data  base  to  be  specified  and  loaded  with  dummy  data.  This 
prototype  data  base  can  then  be  evaluated  for  performance  by  means  of 
specified  data  base  calls  which  are  generated  to  model  prototype 
programs. 


- 234- 


© 1978  by  INPUT,  Menlo  Park,  CA  94025.  Reproduction  Prohibited. 


INPUT 


c 


• A number  of  measurement  aids  are  provided  for  all  DL/I  products.  These 
range  from  statistics  produced  as  a result  of  the  various  DL/I  utilities  used  for 
loading  and  reorganizing  data  bases  to  more  comprehensive  statistics  produced 
for  IMS/VS  by  a Data  Base  Monitor  and  a Data  Communication  Monitor,  both 
of  which  are  integral  parts  of  IMS/VS. 

• Documentation  and  control  aids  are  provided  separately  from  the  DL/I  pro- 
ducts. 


The  Data  Dictionary  can  be  used  for  DL/I  DOS/VS  and  IMS/VS.  It  runs 
primarily  in  a batch  environment.  For  IMS/VS,  an  on-line  facility  is 
also  provided. 

The  Data  Dictionary  maintains,  through  its  own  DL/I  data  bases, 
information  relating  to  all  fields,  segments,  programs,  applications, 
transactions,  and  systems  in  the  installation. 

It  accepts  as  input  the  output  produced  by  the  Data  Base  Design  Aid  as 
well  as  existing  COBOL  structures  together  with  existing  DBDs  and 
PSBs. 

This  input,  together  with  additional  input  entered  in  batch  mode  or  (for 
IMS/VS)  on-line,  is  used  to  create  and  maintain  a dictionary  of 
information  for  the  installation. 


As  a result  of  updates  made  to  the  Data  Dictionary  data  bases,  the 
updated  COBOL  structures,  PL/ 1 structures,  and  updated  DBDs  and 
PSBs  are  produced. 

No  data  base  restructuring  aids  are  provided  for  DL/I  ENTRY  apart  from  the 
Copy/Reload  utility.  However,  the  Data  Base  Design  Aid  can  be  used  to  assist 
in  designing  a new  data  base  structure. 
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Utilities  are  provided  for  the  other  two  DL/I  products  that  assist  in  unloading 
and  reloading  the  data  base  or  reorganizing  the  data  base. 

In  addition,  all  DL/I  products  provide  utilities  that  aid  in  establishing  new  data 
relationships  (logical  relationships)  and  adding  or  changing  secondary  indexes. 

Conversion  aids  are  provided  for  all  three  DL/I  products.  Each  product 
permits  existing  key  sequenced  VSAM  files  to  be  defined  as  simple  HISAM 
(SHISAM)  data  bases.  This  permits  VSAM  files  to  be  migrated  to  DL/I  very 
easily  but  does  not  permit  all  of  the  DI/I  benefits  to  be  gained  until  such 
VSAM  records  are  restructured  as  a hierarchy  of  segments. 

A bridge  is  available  to  convert  previous  IBM  chained  file  data  bases  to  DL/I 
data  bases.  These  chained  file  data  bases  may  have  been  created  and 
maintained  by  the  Bill  of  Material  Processor  (BOMP),  the  Data  Base  Organi- 
zation Maintenance  Processor  (DBOMP)  or  the  Chain  File  Management  System 
(CFMS). 

The  first  two  (BOMP  and  DBOMP)  are  DOS  chained  file  data  bases, 
while  the  latter  (CFMS)  is  an  OS  data  base. 

Conversion  is  accomplished  by  the  Chain  File  - DL/I  Bridge  program 
product  which  converts  the  existing  chained  file  (i.e.,  network)  data 
bases  efficiently  to  DL/I  data  bases. 


In  addition,  an  interface  is  provided  to  intercept  the  various  chained 
file  CALLS  issues  by  application  programs  and  instead  issue  DL/I 
CALLS  to  the  DL/1  data  base. 


IBM  provides  extensive  external  and  internal  education  covering  Concepts, 
Application  Design,  Data  Base  Design,  Application  Programming  and  Instal- 
lation. The  documentation  provided  by  IBM  is  quite  comprehensive  and 
encompasses: 
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General  Information  Manual. 


System/Application  Design  Guide  (for  Systems  Analysts). 

Application  Programmers  Reference  Manual. 

System  Programmers  Reference  Manual. 

Utilities  Manual  (for  System  Programmers  and  Data  Base  Administra- 
tors). 

Operations  Guide  (for  terminal  and  system  operators). 

User's  Guides  (Introductory  Manuals  for  first  time  users). 

Logic  Manuals  (provided  on  payment  of  the  licence  fee  for  the 
particular  DL/I  product). 

b.  Application  Programmer 

• The  Data  Manipulation  Language  (DML)  used  for  all  DL/I  products  is  a CALL 
macro  interface  with  DL/I.  CALL  macros  are  issued  from  Assembler,  COBOL 
and  PL/ 1 programs  requesting  DL/I  activity. 

• IMS/VS  enables  a separate  procedural  language  interface  to  be  used  with 
COBOL  or  PL/ 1 which  bypasses  the  need  for  the  application  programmer  to 
issue  CALL  macros.  These  program  interfaces  are  provided  as  separate 
chargeable  programs  from  IMS/VS  and  are  referred  to  as  COBIMS  and  PLIMS. 

• The  two  DOS/VS  DL/I  products  offer  the  application  programmer  nine 
different  DL/I  CALL  functions,  while  IMS/VS  offers  thirteen  different  CALL 
functions. 
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This  is  indicative  of  the  degree  of  function  carried  out  by  DL/I  on 
behalf  of  the  application  program  and  results  in  significant  simplicity  in 
writing  DL/I  application  programs. 

In  addition,  the  application  program  (because  of  the  simple  program 
interface)  is  generally  unaffected  by  most  data  base  modification  or 
reorganization. 

DL/I  ENTRY  permits  only  one  record  type  (segment)  to  be  retrieved  per 
command  while  the  other  two  DL/I  products  permit  "Path  CALLS"  to  be 
issued  by  the  one  CALL  statement,  which  retrieves  all  segments  in  a 
hierarchical  path  through  the  data  base.  Thus,  up  to  15  segments  may  be 
retrieved  in  one  path  call,  concatenated  and  presented  to  the  application 
program  as  one  consolidated  record. 


The  DL/I  products  provide  extensive  data  search  capabilities.  Each  of  the 
three  products  permit  the  application  program  to  request  segments  based  on 
high,  low  or  equal  comparisons  (or  combinations  of  comparisons  - greater  than 
or  equal,  less  than  or  equal,  etc.). 


In  addition,  DL/I  transparently  will  apply  the  requested  search  logic  to 
fields  either  within  the  data  base  structure  requested  or  alternatively 
secondary  indexes  that  are  organized  in  sequence  based  on  the  fields 
requested  to  be  searched. 


This  data  search  logic  carried  out  by  DL/1  permits  multiple  record 
types  to  be  searched  and  retrieved  for  all  three  products. 

IMS/VS  additionally  provides  Boolean  search  logic,  which  enables  the  applica- 
tion program  to  specify  AND,  OR  and  NOT  logic  whereby  various  field  search 
comparisons  can  be  combined  to  give  a single  result. 
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End  User 


• Each  of  the  three  DL/I  products  provides  a simple  user  language,  GIS 
(Generalized  Information  System),  which  is  an  English-type  query  language. 

GIS  is  available  for  both  DOS/VS  and  OS/VS  (GIS  DOS/VS  and  GIS/VS 
respectively). 

Both  products  execute  in  a batch  environment  and  accept  as  input 
English  language  query  programs.  These  are  compiled  to  executable 
code  and  then  immediately  executed  to  produce  the  requested  results. 

Additionally,  both  GIS  products  generate  (as  a result  of  compilation)  all 
necessary  Job  Control  Language  statements,  together  with  all  neces- 
sary Sort  Control  statements.  They  produce  a job  comprising  several 
job  steps:  extract  requested  records  from  the  DL/I  data  bases,  list 

and/or  save  parts  of  those  records  in  intermediate  work  files,  sort  those 
records  into  various  specified  sequences,  and  produce  final  reports  of 
selected  sorted  records. 

The  GIS  products  require  little  prior  DP  knowledge  to  code,  compile  or 
execute  queries.  Additionally,  a previously  compiled  query  can  be 
saved  for  re-execution  if  required. 

• The  simplicity  of  GIS,  together  with  its  compilation  and  execution  perfor- 
mance, which  is  comparable  to  COBOL  and  PL/ 1 application  programs  written 
to  access  DL/I  data  bases,  results  in  it  being  used  not  only  for  spontaneous  or 
ad-hoc  queries  but  also  for  simple  production  reports  that  are  processed 
regularly. 

• GIS  DOS/VS  supports  a query  capability  only,  permitting  retrieval  of  data  from 
DL/I  data  bases  or  key  sequenced  VSAM  files  (defined  as  SHISAM  data  bases). 

• GIS/VS,  however,  provides  the  ability  to  retrieve  (query),  create,  update  and 
delete  DL/I  data  bases  or  standard  OS/VS  files. 
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While  both  GIS  products  compile  and  process  in  a batch  environment,  an  on- 
line interface  is  provided  for  GIS  DOS/VS  through  the  use  of  CICS/DOS/VS 
(Customer  Information  Control  System/Virtual  Storage)  and  the  CICS  Source 
Program  Maintenance  II  program. 

GIS/VS  has  an  on-line  interface  that  is  provided  by  the  Advanced  Query 
Feature  of  GIS/VS  and  uses  the  IMS/VS  Data  Communication  Feature 
for  on-line  support. 

The  on-line  facilities  available  to  both  GIS  products  permit  GIS  queries 
to  be  entered  via  on-line  terminals,  saved  and  modified  on-line,  and 
then  submitted  on-line  for  batch  compilation  and  execution. 


The  reports  resulting  from  this  batch  execution  can  be  returned  to  the 
terminal  and  displayed  and  then  subsequently  produced  as  hard  copy  on 
either  a printer  attached  to  the  on-line  display  terminal  cluster  or  on 
the  system  printer. 

Additionally,  GIS/AQF  provides  an  on-line  syntax  checker  to  permit  any 
coding  errors  to  be  detected  and  corrected  before  the  GIS  query  is 
submitted  for  batch  compilation  and  execution. 


CICS,  GIS  and  IMS  are  the  prime  Data  Base/Data  Communications  products 
offered  by  IBM.  An  increasing  number  of  application  program  products  are 
becoming  available  which  are  based  on  the  use  of  CICS/VS  or  IMS/VS  DC  for 
on-line  support  and  one  or  more  of  the  DL/I  products  for  data  base  support. 

PLANCODE/I  (Planning,  Control  and  Decision  Evaluation  System): 
PLANCODE  is  available  in  a batch  or  on-line  environment  (under 
control  of  CICS/DOS/VS  or  CICS/OS/VS)  and  is  used  for  financial 
planning  and  budgetary  control. 


DMS/VS  (Display  Management  System/VS):  DMS/VS  is  an  on-line 

application  generator,  which  uses  a fill-in-the-blanks  approach  and 
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permits  on-line  applications  to  be  implemented  directly  from  the 
systems  design  specifications.  It  runs  under  control  of  CICS/DOS/VS  or 
CICS/OS/VS  and  supports  not  only  standard  VSAM  files  but  also  DL/I 
data  bases  controlled  by  all  three  DL/I  products. 

ATMS  (Advanced  Text  Management  System):  ATMS  provides  an  on-line 
text  processing  and  retrieval  system  for  entry,  modification,  and  final 
output  of  textual  documents  and  reports.  It  runs  under  control  of 
CICS/DOS/VS  and  CICS/OS/VS. 

STAIRS/VS  (Storage  And  Information  Retrieval  System):  STAIRS/VS  is 
an  information  retrieval  program  which  supports  the  loading,  main- 
tenance and  retrieval  of  text  document  data  bases,  together  with  DL/I 
data  bases  (for  OS/VS).  Abstracts  or  complete  documents  can  be  stored 
in  data  bases,  and  STAIRS  can  retrieve  all  relevant  documents  which 
contain  eithin  them  keywords  specified  by  the  terminal  operator  in 
various  logical  combinatins.  STAIRS  runs  under  control  of 
CICS/DOS/VS,  CICS/OS/VS  or  IMS/VS  DC.  Additionally,  the  OS/VS 
version  permits  access  to  IMS/VS  DL/I  data  bases. 

ITS  (Interactive  Training  System):  ITS  is  a Computer  Assisted  Instruc- 
tion program  product,  which  permits  individualized  student  instruction 
from  an  on-line  terminal.  Additionally,  it  provides  a Course  Authoring 
feature,  which  enables  authors  to  develop  courses  without  requiring  any 
programming  knowledge.  ITS  runs  under  control  of  CICS/OS/VS  or 
IMS/VS  DC. 

Additionally,  a number  of  application  programs  are  available  for 
specific  industries  or  applications  based  on  CICS,  DL/I  or  IMS/VS. 
Some  of  these  programs  include  a Customer  Information  File  system  for 
banking,  Customer  Information  System  for  utilities,  a Production  Infor- 
mation Control  System  for  manufacturing,  a Life  Insurance  Package,  a 
Transportation  package,  a Health  Care  Support  package  and  many 
others. 
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8.  COST/PERFORMANCE 


9 


a.  Measurable  Costs 


• The  DL/I  products  are  available  on  a rental  basis  only,  which  includes 
maintenance  charges.  No  charge  is  made  for  installation.  However,  IBM 
provides  additional  System  Engineering  Services  support  for  a fee  to  assist  in 
development  of  applications  that  use  DB/DC  products. 


• The  cost  of  the  various  DL/I  products  is: 


Product 


Rental  Per  Month  (approx.) 


DL/I  ENTRY 

$ 350 

DL/I  DOS/VS 

$ 325 

IMS/VS  DB  (batch) 

$ 750 

CICS/DOS/VS 

$ 480 

CICS/OS/VS 

$ 805 

IMS/VS  DC  (on-line) 

$ 910 

GIS  DOS/VS  (batch) 

$ 410 

GIS/VS  (batch) 

$1,095 

GIS/VS+AQF  (query  only) 

$1 ,390 

GIS/VS+AQF  (query,  update, 
create,  modify) 

$2,410 

b.  Real  Memory 

• The  real  memory  required  by  each  of  the  DB/DC  products  varies  depending  on 
the  complexity  of  the  particular  system  designed  and  the  degree  of  function 
and  options  utilized  by  the  installation. 

• The  actual  storage  required  can  be  estimated  from  the  various  System 
Programmer's  Reference  Manuals  for  each  of  the  different  DB/DC  products. 
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• The  following  representative  real  storage  requirements  may  be  used  as  a 
guide.  They  indicate  the  approximate  real  storage  requirement  for  the  first 
user  of  the  product  together  with  the  approximate  storage  required  for  each 
additional  user  (which  may  be  an  additional  program  run  in  a different  batch 
partition  or  an  additional  terminal  user  in  an  on-line  environment). 

Real  Storage 

Product  First  User  Additional  User 


DL/I  ENTRY 

I6K 

4K 

DL/I  DOS/VS 

80K 

30K 

IMS/VS  DB 

I30K 

20K 

CICS/DOS/VS 

60K 

4K 

CICS/OS/VS 

90K 

4 K 

IMS/VS  DC 

200K 

30K 

c.  Performance  Constraints 


• Each  DL/I  product  supports  multi-thread  (multi-tasked)  processing  for  opti- 
mum performance. 

In  a batch  environment,  this  multi-thread  processing  is  provided  through 
use  of  the  particular  operating  system  multi-programming  facilities. 

In  an  on-line  environment,  the  multi-tasking  support  is  provided  by  the 
particular  on-line  monitor,  CICS/VS  or  IMS/VS  DC. 

• CICS/VS  supports  a maximum  of  eight  concurrent  tasks  when  operating  against 
DL/I  ENTRY  data  bases,  255  concurrent  tasks  for  DL/I  DOS/VS  data  bases 
and  15  concurrent  tasks  for  DL/I  OS/VS  (IMS/VS  DB)  data  bases. 

• To  ensure  data  integrity  from  the  possibility  of  simultaneous  updates,  the 
three  DL/I  products  provide  exclusive  control  lockout.  Across  different 
partitions  (inter-partition),  DL/I  ENTRY  provides  lockout  at  the  dataset  level, 
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while  IMS/VS  DB  and  DL/I  DOS/VS  provide  lockout  at  the  repeating  group 
occurrence  (segment  occurrence  level)  for  potentially  improved  performance. 

Within  the  same  partition  (intra-partition),  DL/I  ENTRY  and  DL/I  DOS/VS 
provide  lockout  at  the  repeating  group  occurrence  (segment  occurrence  level. 
Similarly,  IMS/VS  DC  supports  lockout  at  the  repeating  group  occurrence 
(segment  occurrence)  level.  In  the  case  of  CICS/OS/VS  access  to  IMS/VS  data 
bases,  lockout  is  provided  at  the  repeating  group  type  (segment  type)  level. 


DL/I  ENTRY  supports  buffer  management  through  the  facilities  offered  by 
VSAM.  The  other  two  DI/I  products,  however,  provide  a common  DL/I  buffer 
pool  in  addition  to  the  VSAM  buffering  facilities,  to  optimize  I/O  performance. 


Another  element  that  influences  input/output  is  the  support  for  data  grouping. 
DL/I  ENTRY  supports  a maximum  of  63  repeating  group  types  (segment  types) 
per  data  set,  while  the  other  two  DL/I  products  support  a maximum  of  255 
segment  types  per  data  set. 

Both  DOS/VS  DL/I  products  each  support  only  one  data  set  per  data  base, 
while  IMS/VS  supports  up  to  10  data  sets  per  data  base.  This  enables  only  part 
of  the  data  set  to  be  used  for  processing  and  those  data  sets  not  required  need 
not  be  mounted. 


Data  grouping  support  for  all  of  the  DL/I  products  is  such  that  the  Data  Base 
Administrator  can  select  an  appropriate  logical  record,  block  length,  access 
method  and  pointer  relationships  that  will  permit  related  records  to  reside 
either  in  the  same  physical  record  (physical  block)  or  in  the  minimum 
necessary  physical  records. 

The  structure  of  DL/I  enables  all  the  related  segments  of  the  same 
data  base  record  (within  the  same  physical  data  set)  to  reside  as  close 
as  possible  to  each  other,  so  minimizing  the  necessity  for  I/O. 
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This  is  probably  the  most  significant  advantage  of  the  DL/I  products 
over  the  pure  network  (chain  file)  data  base  systems,  which  generally 
require  additional  I/O  accesses  to  retrieve  related  records  (segments) 
that  must  reside  in  different  data  sets  (files)  because  of  the  architec- 
ture generally  used  to  support  network  structures. 

• DL/I  ENTRY  supports  symbolic  relationships  when  establishing  logical  rela- 
tionships between  data  bases  or  data  base  records  or  segments.  The  other 
DL/I  products  support  symbolic  pointers  but  also  support  direct  pointers  to 
establish  logical  relationships  between  different  segments  in  the  same  or 
different  data  base  records. 

• All  three  products  support  sequential  access  methods,  indexed  access  methods 
and  random  access  methods,  (HSAM,  HISAM  and  HDAM).  Additionally,  DL/I 
DOS/VS  and  DL/I  OS/VS  support  an  indexed  direct  access  method  (HIDAM), 
which  offers  the  performance  advantages  of  a direct  access  method  with  the 
indexing  capability  of  an  indexed  access  method. 

• DL/I  ENTRY  is  provided  as  a subset  DL/I  product  for  DOS/VS  users. 
Additionally,  only  part  of  the  function  need  be  used  to  ensure  the  shortest 
possible  instruction  path  length  in  transaction  processing  where  performance  is 
a key  consideration. 

• DL/I  DOS/VS  and  IMS/VS  permit  a subset  of  the  full  function  of  each  product 
to  be  defined  and  generated  to  minimize  the  instruction  path  length  for 
performance  considerations. 

• In  addition,  IMS/VS  provides  a Fast  Path  capability  that  is  intended  for  very 
high  performance  applications  such  as  on-line  data  entry.  This  supports  a Data 
Entry  Access  Method  (DEAM)  for  rapid  processing  and  editing  of  on-line 
transactions. 

To  assist  this  editing,  the  IMS/VS  Fast  Path  feature  also  supports  In 
Memory  Data  Bases  (IMDB)  whereby  data  bases  that  have  to  be 
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accessed  in  a high  performance  environment  can  be  maintained  in 
virtual  storage  rather  than  on  disk  storage. 


9 


IMS/VS  Fast  Path  permits  update,  addition  and  deletion  capability  for 
DEAM  and  IMDB  data  bases  with  full  recovery  and  data  integrity. 


9.  DISTRIBUTED  PROCESSING 


• All  IBM  DB/DC  products  support  distributed  processing,  with  the  on-line 
products  (CICS/VS  and  IMS/VS  DC)  supporting  local  and  remote  intelligent 
controllers  (IBM  "minicomputers")  through  IBM's  System  Network  Architecture 
(SNA). 


• Distributed  processing  is  supported  by  the  Virtual  Telecommunication  Access 
Method  for  CICS/DOS/VS,  CICS/OS/VS  and  IMS/VS  DC.  Additionally,  a subset 
of  the  function  provided  by  VTAM  is  available  for  CICS/DOS/VS  users.  This 
subset  is  referred  to  as  the  Extended  Telecommunications  Module  (EXTM)  and 
is  a separate  chargeable  feature  available  for  use  with  CICS/DOS/VS. 


• The  intelligent  controllers  supported  by  CICS/VS  and  IMS/VS  DC  include  the 
following  communication  systems: 


IBM  3600  Finance  Communication  System  (for  banking  and  finance). 


IBM  3650  Retail  Store  System  (for  point  of  sale  retail  applications). 


IBM  3790  Communication  System  (for  cross-industry  distributed  proces- 
sing). 

IBM  3767  Communication  Terminal  (printer/keyboard  terminal). 

IBM  3770  Communication  Terminal  (a  family  of  RJE-type  terminals). 
IBM  3270  Information  Display  System  (SNA  version  of  the  3270). 
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• Several  of  these  terminal  systems  provide  not  only  program  logic  but  also  disk 
storage  capacity  ranging  from  500K  bytes  to  29  megabytes.  Additionally, 
these  programmable  controllers  themselves  control  the  operation  of  a large 
number  of  terminals  ranging  from  50  to  approximately  200  terminals 
depending  on  the  particular  controller. 

• The  programmability  of  these  minicomputers  permit  standalone  operation 
without  reference  to  the  host  CPU.  This  enables  local  processing  to  be  carried 
out  for  attached  terminals,  retrieving  requested  information  from  disk  files 
maintained  in  the  programmable  controller. 

• The  facilities  offered  by  these  "minicomputers"  are  such  that  reference  need 
only  be  made  to  the  host  to  process  requests  for  information  that  is  retained 
only  in  the  host  data  base  (perhaps  for  reasons  of  size).  At  all  other  times, 
these  minicomputers  are  capable  of  standalone  processing,  resulting  in  mini- 
mum line  costs. 

• The  amount  of  processing  distributed  from  the  host  to  the  remote  minicom- 
puters may  be  such  as  to  permit  their  standalone  operation  during  normal  daily 
processing,  with  communication  to  the  host  at  night  to  transmit  summary 
information  relating  to  the  day's  activities. 

• Functions  previously  carried  out  in  the  host,  such  as  error  recovery,  can  be 
distributed  to  the  various  network  components.  Here  the  programmable 
Communications  Controller  attached  to  the  host  communicates  with  remote 
programmable  controllers  to  identify  and  correct  transmission  errors. 

• A further  advantage  of  distributed  processing  with  IBM  minicomputers  is  the 
ability  for  the  host  and  minicomputer  to  identify  any  lost  messages  as  a result 
of  a system  failure  and  resynchronize  both  ends  of  the  communications  link  by 
retransmitting  lost  messages  if  necessary.  In  this  way,  full  message  integrity 
can  be  assured. 
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10. 


DISTRIBUTED  DATA  BASE 


O 


IBM  has  indicated  that  its  future  Data  Base/Data  Communication  strategy  is 
directed  towards  distributed  processing  and  also  distributed  data  base.  The 
IBM  minicomputers  and  DB/DC  software  support  now  available  indicate  the 
commitment  IBM  has  towards  the  distributed  processing  concept. 

Additionally,  IMS/VS  DC  provides  a feature,  the  Multiple  Systems  Coupling 
Feature  (MSCF),  which  supports  the  local  or  remote  attachment  of  up  to  255 
CPUs.  This  permits  several  CPUs  to  participate  in  the  processing  of 
transactions.  Thus,  with  locally  attached  CPUs,  a large  on-line  workload  can 
be  shared  among  several  CPUs. 

The  main  advantage  of  MSCF  is  in  the  support  of  geographically  dispersed  data 
bases. 


The  distributed  data  base  support  provided  by  IBM  is  a pointer  to  possible 
future  developments  in  the  area  of  IBM  DB/DC  and  indicates  the  degree  of 
commitment  which  IBM  has  now  made  to  the  distributed  processing  and 
distributed  data  base  concepts. 
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IV  G.  System  2000  (MRI  Systems) 


G.  SYSTEM  2000  (MRI  SYSTEMS) 


GENERAL  DESCRIPTION 

System  2000  was  first  released  in  July,  1970,  by  MRI  Systems  in  Austin,  Texas. 
System  2000  can  be  used  on  IBM  System/360  and  System/370,  Univac  I 100 
series  and  CDC  6000,  CYBER  70  and  CYBER  170  series  computers. 

System  2000  is  also  provided  as  a Data  Base  Management  System  Service 
through  the  Infonet  Remote  Computing  Service  of  CSC  and  the  Cybernet 
Remote  Computing  Service  of  CDC. 

System  2000  is  provided  as  a basic  system  with  a number  of  additional 
features. 

a.  Basic  System  2000 

Basic  System  2000  supports  the  definition  of  new  data  bases,  modification  of 
the  definition  of  existing  data  bases,  and  retrieval  and  update  of  information 
in  those  data  bases. 

System  2000  is  an  Inverted  List  DBMS  based  on  a hierarchical  structure. 

The  basic  components  of  a data  base  definition  are  the  data  elements  (fields), 
which  contain  data  values.  Repeating  groups  are  defined  in  System  2000  using 
hierarchical  nesting  down  to  32  levels  of  the  data  base  definition. 

Data  elements  (fields)  and  logical  entries  (records)  may  vary  in  length.  Any 
data  elements  can  be  inverted  and  used  as  key  fields  for  accessing  the  data 
base. 
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Basic  System  2000  provides  data  security  by  password  control  to  the  data  base 
as  well  as  additional  password  control  for  each  component  (data  element  or 
repeating  group). 


Support  is  provided  for  data  base  archiving  as  well  as  recording  an  audit  trail 
of  changes  made  to  the  data  base.  This  audit  trail  can  be  used  to  reconstruct 
the  data  base  by  applying  it  completely  or  in  part  to  an  archival  copy  of  that 
data  base. 

b.  System  2000  Additional  Features 


The  Procedural  Language  Interface  (PLI)  permits  access  to  a System  2000  data 
base  from  programming  languages  such  as  COBOL,  FORTRAN  or  Assembly 
language.  Programs  written  in  these  languages  are  able  to  address  any  part  of 
the  data  base,  retrieve  data  in  a sequence  and  format  suitable  for  processing, 
and  update  the  data  base  from  the  program. 

Support  is  provided  in  the  Univac  version  only  for  establishing  inter- 
relationships (Links)  between  two  or  more  data  bases,  enabling  limited 
network  data  structures  to  be  defined. 


Data  base  qualification  is  performed  by  use  of  an  index  file  for  inverted 
list  access.  Retrieved  data  can  be  sorted  by  one  or  more  keys  prior  to 
return  of  the  first  set  of  data  to  the  program. 

Immediate  Access  Feature  provides  a query  language  that  enables  a user  to 
express  requests  for  retrieval  or  updating  of  a data  base.  The  Immediate 
Access  language  is  similar  to  English  and  is  particularly  suited  for  interactive 
use  from  remote  terminals. 

Report  Writer  Feature  provides  a reporting  language  that  enables  the  user  to 
prepare  report  definitions  in  which  he  may: 

Generate  breakpoints  on  any  data  base  element  or  repeating  group. 
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Specify  headings  and  footings  for  each  physical  page,  logical  page,  and 
the  report  as  a whole. 

Control  printing,  item  inclusion,  page  ejection,  and  the  accumulation  of 
sub  or  grand  totals  dynamically  via  conditional  statements. 

Generate  up  to  100  reports  from  a single  pass  of  the  data  base  index 
files. 

• Sequential  File  Feature  is  supported  for  Univac  installations  only  and  allows 
the  System  2000  user  to  create  and  access  data  bases  residing  on  magnetic 
tape.  Such  sequential  data  bases  can  be  accessed  either  by  the  Report  Writer 
feature  or  a Procedural  Language  Program. 

A link  capability  within  the  Procedural  Language  Interface  permits  the 
user  to  establish  logical  associations  between  physically  separate  data 
bases. 

A sequential  data  base  may  thus  be  viewed  as  a logical  extension  of  a 
direct  data  base. 

• Teleprocessing  Monitor  Feature  (IBM  version)  - TP  2000  is  a data  communi- 
cations software  package  that  can  be  used  in  conjunction  with  System  2000  to 
provide  data  base  access  via  either  Immediate  Access  or  the  Procedural 
Language  Feature.  TP  2000: 

Is  supported  for  IBM  installations  only  and  may  be  used  with  OS/370 
data  files  or,  instead,  in  combining  OS/370  data  files  and  System  2000 
data  bases. 

Interfaces  with  a large  number  of  communication  terminals,  including 
TTY,  TTY-compatible  devices,  IBM  2741,  2260,  3270  or  compatible 
devices.  TP  2000  is  multi-tasked,  and  the  execution  of  application 
programs  may  be  multi-threaded. 
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Has  been  interfaced  with  commercially  available  teleprocessing  moni- 
tors such  as  TCAM,  ALPHA,  HYPERFASTER,  and  BEST.  MRI  will 
interface  System  2000  to  other  teleprocessing  monitors  for  a nominal 
fee. 


• Multi-User  Feature  (IBM  version)  permits  communication  between  one  or  more 
regions  or  partitions  and  a single  copy  of  System  2000.  Immediate  Access 
commands  from  the  teleprocessing  monitor  and  System  2000  requests  from 
procedural  language  programs  in  other  partitions  or  regions  are  processed  with 
simultaneous  use  of  one  or  more  data  bases.  Data  bases  are  protected  from 
simultaneous  updates  in  this  environment. 


• Multi-Thread  Feature  (IBM  version)  is  designed  to  support  installations  where 
teleprocessing  demand  rates  exceed  the  throughput  capabilities  of  the  stan- 
dard single-thread  system.  The  Multiple  Thread  Feature  queues  input  and 
output  terminal  messages  and  controls  the  concurrent  processing  of  multiple 
System  2000  commands.  The  user  declares  the  number  of  threads,  from  one  to 
eight,  based  on  demand  rate,  core  availability  and  peripheral  storage  consider- 
ations. 


• Queued  Access  Module  is  part  of  the  Basic  System  2000  and  carries  out 
retrieval,  update,  and  data  loading  operations  in  an  interactive  or  batch 
processing  environment. 

It  accepts  commands  similar  to  that  of  the  Immediate  Access  Feature, 
analyses  all  of  the  selection  criteria  for  a number  of  Queued  Access 
commands  and  extracts  the  requested  data  by  a single  sequential  pass 
through  the  data  base. 

This  provides  higher  performance  than  Immediate  Access  when  a 
number  of  selections  are  to  be  made  at  the  same  time  from  the  data 
base. 
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c. 


Data  Definition  Language  (DDL) 


A System  2000  data  base  is  defined  as  a hierarchically  organized  structure  or 
"data  tree." 

System  2000  supports  seven  different  data  types.  These  are: 


REPEATING  GROUP: 

A data  set  name. 

- NAME: 

Any  alphanumeric  data  (250  characters  maxi- 
mum) with  all  leading,  trailing,  and  extraneous 
blanks  discarded. 

- TEXT: 

Any  alphanumeric  data  (250  characters  maxi- 
mum) with  all  blanks  being  retained  in  the 
data. 

- DATE: 

A fixed  format  statement  of  MM/DD/YY  or 
MM/DD/YYYY  standing  for  month,  day  and 
year. 

INTEGER  NUMBER: 

A string  of  numerals  (0-9),  plus  sign,  restricted 
by  a picture  designation. 

DECIMAL  NUMBER: 

A positive  or  negative  string  of  numbers  with 
a decimal  point. 

MONEY  NUMBER: 

Same  characteristics  as  for  DECIMAL  NUM- 
BER but  displayed  with  a $ sign,  decimal 
numbers  and  CR  for  credit  (if  negative). 

NAME,  TEXT  may  occupy  up  to  250  characters  maximum  as  indicated  above. 
However,  System  2000  enables  these  fields  to  be  defined  in  the  data  base  with 
the  most  likely  number  of  characters  and  then  automatically  overflows  any 
additional  characters  beyond  this  most  likely  number,  up  to  the  maximum  of 
250  characters. 
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Unless  otherwise  explicitly  specified  as  NON-KEY,  all  data  elements  are 
automatically  indexed,  such  that  each  unique  value  for  the  particular  data 
element  (field)  in  the  data  base  is  inverted  so  permitting  System  2000  to 
identify  quickly  through  inverted  indexes  all  of  the  data  base  records  contain- 
ing the  particular  data  element  value  of  interest. 


Fields  that  are  explicitly  stated  as  KEY  fields  (or  that  default  to  KEY  fields) 
can  subsequently  be  referenced  in  the  WHERE  clause  in  the  Immediate  Access, 
Queued  Access  or  Procedural  Language  Interface  (PLI)  Data  Manipulation 
Languages  (DML).  The  WHERE  clause  permits  selection  criteria  to  be  applied 
against  the  various  logical  entries  (records)  in  the  data  base. 

Each  component  (Repeating  Group  or  Data  Element)  in  the  data  base  defini- 
tion is  numbered,  with  a defined  separator  character  (typically  *)  between  the 
"component  number"  and  the  component  definition. 


d.  Data  Manipulation  Language 

The  System  2000  Data  Manipulation  Language  (DML)  is  the  Procedural 
Language  Interface  (PLI). 


PLI  establishes  communication  between  a FORTRAN  or  COBOL  pro- 
gram and  a System  2000  data  base.  Using  PLI  commands,  data  may  be 
located,  retrieved,  modified  or  updated. 

A System  2000  pre-compiler  translates  the  PLI  statements  into  accept- 
able source  code  suitable  for  compilation  like  any  other  FORTRAN  or 
COBOL  program. 

e.  Immediate  And  Queued  Access  Features 


The  Immediate  and  Queued  Access  features  provide  an  end  user  Query 
Language  for  System  2000.  Both  Immediate  Access  and  Queued  Access  enable 
end  users  to  retrieve  and  updata  System  2000  data  bases  in  either  an 
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interactive  environment  (Immediate  Access)  and  a batch  environment  (Queued 
Access). 

When  operating  in  the  Immediate  Access  mode,  each  command  submitted  by 
the  user  is  processed  individually  regardless  of  the  processing  environment. 
Whether  the  user  is  linked  to  the  system  through  a remote  job  entry 
environment  or  through  an  interactive  environment,  each  command  is  inter- 
preted and  processed  before  the  next  command. 

The  Queued  Access  module  is  part  of  the  Basic  System  2000  product  while  the 
Immediate  Access  feature  is  an  additional  charged  option  as  is  the  Procedural 
Language  Interface. 

In  Queue  Processing,  the  entire  command  stream  (which  begins  with  the 
command  QUEUE  and  ends  with  the  command  TERMINATE)  is  read  before  any 
retrieval  or  updating  occurs. 

Each  command  is  scanned  for  syntactic  errors,  one  command  at  a time, 
until  the  TERMINATE  command  is  read,  at  which  time  command 
execution  takes  place. 

Once  the  command  stream  has  been  completely  examined,  all  the 
WHERE  clauses  are  processed  together.  After  the  selected  data  sets 
are  found,  operations  on  the  data  base  are  carried  out. 

The  advantage  of  Queue  Processing  is  that  a number  of  commands  are 
analyzed  and  processed  together  with  only  one  scan  of  the  data  base  as 
required. 
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f. 


Strings  And  User  Functions 


System  2000  permits  the  definition  of  both  simple  and  extended  strings,  which 
enable  a user  to  call  a previously  defined  command  seguence  to  carry  out 
selection  and  calculation  and  return  the  result  to  the  user.  Strings  are  stored 
within  the  data  base  definition  and  are  allocated  a component  name. 

An  Extended  String  permits  data  values  to  be  entered  at  execution  time  to  be 
used  in  resolving  the  data  value  for  the  String  component  number. 

A User  Function  permits  the  definition  of  a component  name  whose  value  will 
be  dynamically  calculated  at  execution  time  when  invoked,  based  on  the 
current  value  of  other  data  elements. 


Additionally,  System  2000  provides  a number  of  System  Functions  including 
MAX/MIN,  COUNT,  SUM,  AUG,  and  SIGMA. 

g.  Report  Writer 


The  Report  Writer  Feature  of  System  2000  allows  the  user  to  define  and 
generate  as  many  as  100  formatted  reports  from  a single  scan  of  the  data 
base. 


The  user  defines  one  or  more  reports  specifying  what  data  base 
elements  are  to  be  included,  their  sort  order,  calculations  to  be 
performed  and  the  format  of  the  output  reports. 

When  the  definition  phase  is  complete,  the  user  reguests  execution  of 
the  reports  by  issuing  the  GENERATE  command. 

This  GENERATE  command  may  contain  a WHERE  clause  to  restrict  the 
number  of  data  sets  gualified  for  this  group  of  reports. 
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If  there  is  no  WHERE  clause,  a scan  of  the  data  base  selects  the  data 
elements  required  for  aU  of  the  reports  to  be  executed. 

h.  Method  Of  Operation 

System  2000  uses  six  different  files  to  support  a data  base.  These  are 
implemented  using  direct  access  on  IBM,  Univac  and  CDC. 

The  Data  Base  Definition  File  is  a small  table  that  contains  the  data 
base  structure  definition  and  data  element  descriptions. 

For  each  key  data  element,  there  is  a pointer  to  an  appropriate  entry  in 
the  second  file  - the  Unique  Values  Table.  This  contains  one  entry  for 
each  unique  value  in  a keyed  element,  with  a pointer  to  a block  of 
addresses  in  the  third  file  - the  Values  Index. 

This  third  Values  Index  file  contains  blocks  of  addresses  into  the  fourth 
file  - the  Hierarchical  Location  Table. 

The  Hierarchical  Location  Table  contains  pointers  to  the  parent,  sibling 
and  child  data  elements  in  the  hierarchical  structure. 

There  is  also  a direct  pointer  to  the  relevant  data  record  in  the  Data 
File,  the  fifth  file  and  the  data  base  containing  the  actual  data  records 
themselves. 

The  sixth  file  is  the  Overflow  File,  which  contains  those  NAME  and 
TEXT  data  element  values  that  overflow  the  specified  average  number 
of  characters  in  the  data  base  definition  for  those  data  elements.  The 
Unique  Values  Table  (file  2)  also  contains  a pointer  to  the  Overflow  File 
record  for  those  NAMES  and  TEXT  data  element  values  that  exceed  the 
specified  average  number  of  characters  in  the  data  base  definition. 
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System  2000  accesses  the  Data  Base  Definition  File  and  retrieves  the  NAME 
pointer  information.  This  points  to  the  appropriate  section  of  the  Unique 
Values  Table  containing  the  various  separate  NAME  values  from  the  Data  File. 


• These  various  NAME  values  are  organized  in  ascending  sequence,  permitting  a 
binary  search  to  be  made  to  locate  the  value  of  interest  (ADAMS  in  this 
example). 

The  unique  value  of  ADAMS  then  points  to  the  appropriate  section  of 
the  Values  Index  file.  This  file  contains  a pointer  for  each  unique 
ADAMS  record  in  the  data  base. 


Each  of  these  pointers  point  to  an  appropriate  entry  in  the  Hierarchical 
Location  Table. 


The  Hierarchical  Location  Table  contains  pointers  to  the  Parent,  Sibling 
and  Child  data  elements  related  to  the  NAME  data  element  in  the 
structure.  Each  entry  also  contains  a pointer  to  the  relevant  ADAMS 
data  record  in  the  Data  File. 


For  those  NAME  data  records  where  the  particular  value  exceeds  the 
specified  number  of  characters  for  the  NAME  data  element  in  the  data 
base  definition,  a pointer  is  provided  in  the  data  record  to  the 
appropriate  additional  overflow  charactrers  in  the  Overflow  File. 

Similarly,  the  Unique  Values  Table  contains  the  pointer  to  the  Overflow 
File  for  each  NAME  value  which  exceeds  the  specified  number  of 
characters  in  the  data  base  definition. 


• The  direct  retrieval  of  a specific  record  requires  several  file  accesses. 
However,  as  with  most  Inverted  File  DBMS  products,  complex  selection 
criteria  may  be  resolved  primarily  by  accessing  the  Unique  Values  Table  for 
each  data  element  and  value  to  be  examined. 
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Only  for  those  records  that  satisfy  the  selection  criteria  is  the  Values 
Index  File  accessed  to  identify  each  occurrence  of  the  appropriate  data 
value,  which  then  results  in  reference  to  the  Hierarchical  Location 
Table  and  then  the  Data  File  (and  possible  also  the  the  Overflow  File). 

Once  position  has  been  established  in  the  Hierarchical  Location  Table, 
the  various  hierarchical  pointers  provide  quick  access  to  related  hier- 
archical data  in  the  data  base. 

• As  with  any  Inverted  File  DBMS  that  identifies  volatile  data  elements  as  keys, 
considerable  index  maintenance  load  may  result  if  those  data  elements  change 
frequently,  as  those  changes  must  be  reflected  in  the  various  System  2000 
index  files. 

Best  performance  is  achieved  where  relatively  static  data  elements  are 
defined  as  key  fields  for  retrieval. 

The  WHERE  clause  in  the  Immediate  and  Queued  Access  features  and 
the  Procedural  Language  Interface  identifies  only  those  data  elements 
defined  as  KEY.  Such  selection  criteria  in  WHERE  clauses  are  resolved 
automatically  by  System  2000  through  access  to  its  various  indexes. 

• For  those  data  elements  that  are  volatile,  System  2000  enables  selection 
criteria  to  be  applied  using  the  IF  clause. 

• The  IF  clause  is  applied  by  System  2000  once  the  records  identified  by  the 
WHERE  clause  have  been  retrieved  from  the  data  base. 

• Thus,  the  Data  Base  Administrator  can  reach  a satisfactory  compromise 
between  complex  selection  criteria  and  both  static  and  volatile  data  elements 
by  judicious  use  of  the  WHERE  clause  (to  access  KEY  fields)  and  the  IF  clause 
(to  analyze  retrieved  records  according  to  the  selection  criteria). 
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2. 


BASIC  FUNCTIONAL  CAPABILITIES 


o 


a.  User  Accessibility 

• System  2000,  through  the  Procedural  Language  Interface  (PLI),  permits  access 
to  data  bases  from  Assembler,  COBOL  and  FORTRAN  programs. 

• End  user  languages  are  provided  by  the  Queue  Access  mode  of  Basic  System 
2000  for  high  performance  batch  retrieval  of  records,  satisfying  several  query 
commands  with  one  sequential  scan  of  the  data  base. 

• The  Immediate  Access  feature  provides  an  interactive  end  user  language 
whereby  each  query  is  immediately  analyzed  and  processed  against  the  data 
base.  The  Immediate  Access  feature  can  be  used  in  an  RJE  environment  as 
well  as  a remote  terminal  interactive  environment. 


• A Report  Writer  feature  is  provided  for  end  user  definition  of  complex 
formatted  reports. 


• System  2000  provides  TP  2000  Data  Communications  support  for  Immediate 
Access.  Other  Data  Communications  support  is  provided  for  CICS  and  TCAM. 


b.  Multiple  Views  Of  Data 

• System  2000  supports  sequential  retrieval,  random  retrieval,  indexed  retrieval 
and  multiple  indexed  retrieval. 

• The  Queue  Access  mode  provides  good  batch  sequential  performance  for  end 
users  while  the  PLI  provides  sequential  retrieval  capabilities  for  COBOL  and 
FORTRAN  Programs. 

• Random  retrieval  is  provided  through  the  various  indexed  levels  of  System 

2000. 

o 
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Being  an  Inverted  File  DBMS,  System  2000  provides  its  greatest  flexibility  in 
indexed  and  multiple  indexed  retrieval.  Unless  explicitly  indentified  as  NON- 
KEY, all  data  elements  are  assumed  to  be  KEY  and  are  automatically 
incorporated  in  the  System  2000  indexes. 

The  WHERE  clause  of  System  2000  permits  retrieval  on  the  basis  of  up  to  25 
selection  criteria  against  up  to  10  "data  sets"  (Repeating  Groups),  resulting  in 
reference  to  the  various  System  2000  indexes  for  retrieval  only  of  those 
records  that  satisfy  the  criteria. 

c.  Data  Consolidation 

System  2000  supports  up  to  500  data  elements  in  a data  base.  These  may  be  a 
mixture  of  Repeating  Group  (RG)  components  or  data  elements.  Thus,  the 
number  of  RG  types  per  entity  (data  base  record  or  logical  entry)  is  500-N. 
There  is  no  limit  to  the  number  of  occurrences  per  RG  type. 

Variable  length  occurrences  are  supported  by  System  2000.  In  particular, 
NAME  AND  TEXT  data  element  types  can  each  be  up  to  250  characters  long 
but  defined  in  the  data  base  description  according  to  their  most  likely  number 
of  characters.  Any  overflow  characters  beyond  this  most  likely  number  are 
automatically  stored  in  a separate  Overflow  File,  so  providing  a variable 
length  record  support  capability. 

A maximum  of  thirty-two  nested  levels  are  supported  in  a System  2000  data 
hierarchy. 

Relationships  between  various  entities  (data  base  records)  is  provided  by  the 
LINK  feature  (supported  for  Univac  systems  only).  A Procedural  Language 
Interface  program  may  formulate  a request  involving  up  to  ten  records  using 
the  LINK  feature. 

Apart  from  this  LINK  feature,  no  entity  relationships  are  supported  across 
data  bases. 


- 261  - 

© 1978  by  INPUT,  Menlo  Park,  CA  94025.  Reproduction  Prohibited. 


INPUT 


3. 


DATA  INDEPENDENCE 


o 


a.  Levels  Of  Mapping 

• System  2000  provides  an  Internal  Map,  but  has  only  limited  support  for  an 
External  Map. 

• Conceptual  mapping,  to  transform  between  a physical  structure  in  the  data 
base  and  a logical  structure  provided  by  the  program,  is  not  supported  by 
System  2000. 

• Provision  is  made  in  PLI  Programs  to  specify  only  those  fields  required  in  a 
Repeating  Group  for  Field  Level  Definition.  However,  format  translation  is 
not  supported  to  convert  between  a physical  data  format  and  that  data  format 
required  by  a program. 


b.  Data  Base  Changes 

• The  degree  of  data  independence  supported  by  System  2000  can  be  evaluated 
by  activities  necessary  to  incorporate  a number  of  data  base  changes  as 
follows: 


- CHANGE  DEVICE  TYPE:  The  data  base  must  be  reloaded  onto  the 

appropriate  device.  Program  recompi- 
lation or  change  is  not  required. 

- CHANGE  ACCESS  METHOD:  System  2000  supports  only  a direct  ac- 

cess method  for  its  various  index  files. 
No  other  access  method  change  is  per- 
mitted. 


9 
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CHANGE  ENTITY  VIEW:  A changed  entity  view  can  be  accom- 

plished by  specifying  the  new  hierarch- 
ical structure  in  the  data  base  definition. 
Depending  on  the  extent  of  this  hier- 
archical change  and  the  use  of  GETA 
(Get  Ancestor)  and  GETD  (Get  Descen- 
dant), logic  change  may  be  required. 

ADD  NEW  ENTITY:  No  program  logic  change,  recompilation 

or  data  base  reload  is  necessary  to  add  a 
new  data  base  record. 


ADD  NEW  RG  TYPE:  As  each  RG  Type  requires  a System  2000 

SCHEMA  description  of  the  data  set 
(Repeating  Group)  in  a PLI  Program, 
changes  to  a program  will  be  necessary 
with  recompilation  and  reload  of  the 
data  base. 

ADD  NEW  RELATIONSHIP:  Relationships  between  different  System 

2000  data  bases  are  not  supported, 
except  for  the  LINK  feature  for  Univac. 


ADD  NEW  FIELD  TO  RG:  As  PLI  Programs  can  specify  only  those 

data  elements  required,  the  addition  of  a 
new  field  to  a Repeating  Group  may  not 
require  a logic  change.  However,  this 
new  field  must  be  incorporated  in  the 
SCHEMA  definition  of  that  Repeating 
Group  in  the  PLI  program  and  the  new 
field  must  be  loaded  to  the  data  base. 
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CHANGE  FIELD  FORMAT: 


No  provision  is  made  to  change  from  a 
physical  data  format  on  the  data  base  to 
a logical  data  format  required  by  a pro- 
gram. 


DATA  INTEGRITY 
a.  Exclusive  Control 


The  lowest  level  of  exclusive  control  across  partitions  is  at  the  data  element 
type.  The  data  element  type  is  also  the  lowest  level  of  exclusive  control 
within  a partition  using  the  Multi-Thread  Feature  of  System  2000. 

The  System  2000  documentation  is  not  clear  on  how  deadlock  is  prevented  or 
who  is  responsible  for  establishing  exclusive  control  and  resolving  deadlocks. 

Support  for  Program  Isolation,  such  that  a program  may  logically  update  a 
series  of  data  base  records  and  retain  exclusive  control  until  that  logical  series 
of  updates  is  complete,  is  not  clear. 

RECOVERY/RESTART 

a.  Recovery 

System  2000  automatically  logs  after-images  following  an  update  of  data 
elements. 

A Copy/Restore  utility  is  provided  to  take  an  image  copy  of  the  data  base  at 
periodical  intervals.  A utility  is  also  provided  to  apply  after-image  log 
activity  to  the  data  base  copy  to  reconstruct  the  data  base  in  the  event  of 
physical  damage. 

The  smallest  recoverable  unit  is  the  data  base. 
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b. 


Batch  Restart 


• System  2000  automatically  logs  before-images  prior  to  an  update  taking  place. 

• A Backout  utility  is  provided  to  remove  the  effect  of  partially-completed 
update  processing  due  to  a system  power  or  machine  failure. 

• Intermediate  restart  points  can  be  established  in  batch  programs, 
c.  On-Line  Restart 

• Message  logging  for  System  2000  is  achieved  by  logging  the  complete 
Immediate  Access  transaction,  which  is  the  data  base  command  if  update  is 
requested.  This  is  carried  out  automatically  by  the  system  and  applies  to  the 
same  log  as  normal  data  base  logging  activity. 

• As  the  Immediate  Access  command  is  logged,  messages  can  be  reprocessed  in 
the  same  event  sequence. 

6.  DATA  SECURITY 

• System  2000  uses  a password  mechanism,  which  enables  up  to  20  passwords  to 
be  defined  for  each  field  or  the  data  base. 

• The  access  options  that  can  be  specified  are  retrieve  only  or  update. 

• The  programmer  specifies  the  appropriate  password  during  the  OPEN  com- 
mand in  PLI  programs  or  in  identifying  the  appropriate  data  base  for  which 
access  is  requested  with  Immediate  Access  or  Queued  Access. 
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EASE  OF  USE 


a.  Data  Base  Administrator 


The  Data  Definition  Language  used  by  System  2000  is  particularly  easy  to 
specify,  indicating  clearly  the  hierarchical  structure  of  the  data  together  with 
identifying  the  data  types,  picture  clauses  and  key  fields. 

A Data  Dictionary  documentation  and  control  aid  is  provided  with  CONTROL 

2000. 

Good  externals  education,  user  guide  documentation  and  reference  documen- 
tation is  provided. 

b.  Application  Programmer 

The  Data  Manipulation  Language  used  by  System  2000  is  a procedural  language 
(Procedural  Language  Interface)  which  can  be  used  with  COBOL  or  FORTRAN 
programs.  This  has  an  English-like  syntax  with  15  operation  commands. 

The  WHERE  clause  of  the  PL  I supports  selection  of  up  to  25  data  elements 
across  up  to  10  data  sets.  Thus,  the  number  of  record  types  per  command  is 
25. 


Extensive  data  search  capability  is  provided  with  high,  low,  and  equal 
operators  together  with  Boolean  operators  across  multiple  record  types  using 
the  WHERE  clause  as  indicated  above. 

c.  End  User 


The  Queued  Access  mode  of  Basic  System  2000  provides  a batch  user  language, 
which  can  be  invoked  in  both  an  RJE  or  interactive  environment. 

The  Immediate  Access  feature  provides  an  on-line  user  language. 
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• Both  access  modes  permit  retrieve,  update,  add  and  delete. 

8.  COST/PERFORMANCE 

a.  Measurable  Costs 

• System  2000  is  provided  either  on  a purchase  basis  or  a rental  basis.  The  Basic 
System  2000  is  $25,000  purchase,  or  $960.00  rental  per  month  (with  option  to 
purchase). 

• Together  with  Immediate  Access  and  TP  2000,  the  cost  is  approximately 
$50,000  purchase  or  approximately  $1,900.00  per  month  rental  (with  option  to 
purchase). 

• The  following  charges  are  approximate: 


FULL  PAID  UP  LEASE 

MONTHLY  RENTAL 
(Without  Purchase  Option) 

- 

BASIC  SYSTEM 

$25,000 

$740 

- 

PROCEDURAL  LANGUAGE 
(1st  Interface) 

5,000 

295 

(Sub  Seq.  Inter) 

5,000 

150 

- 

IMMEDIATE  ACCESS 

20,000 

590 

- 

REPORT  WRITER 

15,000 

445 

- 

SEQUENTIAL  FILE 

7,500 

225 

- 

EXTENDED  OPTIMIZATION 

10,000 

295 

- 

TP  MONITOR 

2,500 

75 

- 

MULTIPLE  THREAD 

20,000 

590 

b. 

Real  Memory 

The  storage  required  by  System 

2000  for  execution 

is  of  the  order  of  I40K 

bytes. 
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c. 


Performance  Constraints 


O 


• System  2000  executes  in  a single-thread  mode  on  update  or  in  a batch 
processing  environment.  Multithread  processing  is  provided  with  the  Multi- 
Thread  Feature  in  an  on-line  environment,  supporting  up  to  a maximum  of 
eight  concurrent  tasks  (IBM  version  only). 

• The  exclusive  control  lockout  level  across  partitions  is  at  the  data  element 
type  and,  within  a partition  with  the  multithread  feature,  is  also  on  the  basis 
of  data  element  type. 

• Data  grouping  can  be  achieved  by  the  Data  Base  Administrator  at  load  time  as 
all  repeating  group  types  in  a data  base  reside  in  the  same  data  file  and  can  be 
loaded  sequentially.  Thus,  a particular  data  base  record  (data  entity)  resides 
in  only  one  data  file. 


Relationships  between  various  data  bases  are  not  supported  for  IBM  or  Control 
Data  but  are  supported  for  Univac  with  the  LINK  feature. 


• While  System  2000  is  an  Inverted  File  DBMS  and  provides  very  good  indexed 
access  to  the  data  base,  the  various  indexes  are  supported  by  direct  access 
data  management. 


• A subset  of  the  full  capability  of  System  2000  is  provided  by  Basic  System 
2000.  This  supports  data  base  definition  and  the  Queue  Access  mode. 
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IV  H.  TOTAL  (CINCOM) 


H.  TOTAL  (CINCOM) 


I.  GENERAL  DESCRIPTION 

• TOTAL  is  a Data  Base  Management  System  marketed  by  Cincom  Systems  in 
many  countries  of  the  world  today.  It  is  one  of  the  most  widely  used  Data 
Base  Management  Systems,  with  the  number  of  users  throughout  the  world  in 
excess  of  1,500  and  approaching  the  number  of  users  of  IBM  DL/I  Data  Base 
systems. 

• TOTAL  was  conceived  and  developed  by  three  former  IBM  Systems  Engineers 
during  the  latter  half  of  the  1960s.  It  was  first  marketed  in  Cincinnati  in  late 
1968. 

• TOTAL  is  available  in  several  versions:  TOTAL  4,  TOTAL  5/6,  TOTAL  5/65, 
TOTAL  7 and  TOTAL/EI.  Of  these  various  versions,  TOTAL  7 and  TOTAL/EI 
are  the  most  recent,  with  TOTAL  8 understood  to  be  in  development  (or  just 
becoming  available). 

• TOTAL  7 is  available  on  several  manufacturer's  CPUs: 

IBM  System/360,  System/370,  System  3 
Honeywell  200,  2000,  Series  60  level  62  and  66. 

Univac  70,  90. 

NCR  101,  300,  Citerion. 

CDC  Cyber/ 1 70,  70,  6000 
ICL  1900,  2903. 
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Varian  70. 


o 


Digital  I 1/34  or  larger. 

Burroughs  2500-4800. 

Interdata  7/32,  8/32. 

Harris  minis. 

Siemens. 

In  addition  to  the  ENVIRON/ 1 TP  monitor  marketed  by  Cincom,  the  following 
TP  monitors  are  known  to  interface  with  TOTAL  on  IBM  CPUs: 

Programming  Methods  Incorporated  - INTERCOMM. 

TURNKEY'S  TASKMASTER. 


IBM's  CICS. 

• The  required  interface  code  for  ENVIRON/ 1 and  CICS  is  provided  by  Cincom 
while  PMI  and  TURNKEY  provide  interfaces  for  their  products. 

• Query  and  Report  Writer  interfaces  with  TOTAL  exist  for  the  following 
products: 

Cincom's  SOCRATES. 


Informatics'  MARK  IV. 
PMI's  SCORE. 


Cullinane's  CULPRIT. 
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a. 


Data  Base  Organization 


• TOTAL  utilizes  a Data  Base  organization  scheme  that  is  based  on  network 
structures.  The  term  "network"  used  in  this  context  means  that  a record 
points  to  many  other  records  and  can  be  pointed  to  by  many  records. 

• TOTAL'S  basic  data  organization  is  similar  to  the  chained  file  Data  Base 
systems  previously  offered  by  IBM  BOMP  - Bill  of  Material  Processor,  DBOMP 
- Data  Base  Organization  Maintenance  Processor,  and  CFMS  - Chain  File 
Management  System,  although  it  has  many  extended  capabilities  beyond  those 
provided  by  the  above  products. 

• TOTAL  uses  two  different  types  of  files  in  forming  a Data  Base: 

Master  data  sets  that  are  used  to  store  nonrecurring  information  under 
the  control  of  a unique  key. 

Variable  Entry  data  sets  that  are  used  to  store  repetitive  information 
associated  with  master  data  set  records  and  to  logically  link  master 
records  together. 

• The  advantage  of  TOTAL'S  network  approach  is  the  ability  to  expand  the  data 
base  easily. 

• Certain  points  should  be  kept  in  mind  in  expanding  a TOTAL  data  base  by  the 
addition  of  new  data  sets: 

Master  records  cannot  point  to  other  Master  records. 

Variable  records  cannot  point  to  other  Variable  records. 

• Additionally,  a number  of  other  rules  must  be  followed  concerning  Master  and 
Variable  files: 


-271- 


© 1978  by  INPUT,  Menlo  Park,  CA  94025.  Reproduction  Prohibited. 


INPUT 


Each  file  is  stored  in  a separate  data  set. 
All  files  are  fixed  length  blocked. 


Master  files  are  accessible  only  by  randomizing  the  key  value. 

The  randomizing  algorithm  is  furnished  as  part  of  TOTAL  and  is 
constant  for  all  key  formats.  No  index  or  key  sequential  support  is 
provided.  Keys  must  be  unique. 

Master  files  point  to  Variable  files  by  direct  address  (relative  record 
number). 


Variable  files  point  to  Master  files  via  keys  (symbolic  pointers).  This 
fact  allows  movement  of  records  in  Master  files  without  updating 
pointers  but  has  a cost  in  retrieval  time  because  keys  must  be 
randomized  and  synonym  chains  searched. 

Each  Variable  file  record  can  exist  on  multiple  "linkage"  paths  depen- 
dent to  multiple  Master  files. 

Each  Variable  file  can  have  multiple  record  formats.  This  facility  will 
sometimes  allow  more  levels  of  hierarchy  but  usually  at  the  expense  of 
disk  space  and  performance.  Disk  space  is  wasted  if  the  formats  vary 
widely  in  length  because  of  the  fixed  length  requirement.  Performance 
suffers  because  the  serial  chain  of  records  is  lengthened. 


Each  linkage  path  contains  forward  and  backward  direct  pointers. 
Variable  file  records  are  accessible  via  Master  records  to  which  they 
are  dependent  by  following  the  appropriate  linkage  path. 


- 272  - 


© 1978  by  INPUT,  Menlo  Park,  CA  94025.  Reproduction  Prohibited. 


INPUT 


• Master  file  - a collection  of  single  format,  fixed  length  records  randomly 
addressed  via  a key  conversion  algorithm.  Master  records  can  point  to  and  be 
pointed  at  by  Variable  file  records  but  cannot  point  to  other  Master  file 
records. 

• Variable  file  - a collection  of  fixed  length  records  accessible  only  via  one  or 
more  Master  records  with  which  this  file  is  associated  by  defined  linkage 
paths.  Variable  files  are  used  to  store  repetitive  information  about  Master  file 
records  and  to  link  associated  Master  records  together  on  a network. 

• Linkage  path  - a logical  path  connecting  a Master  record  to  a set  of  Variable 
records.  The  path  physically  consists  of  forward  and  backward  pointers 
located  in  the  Variable  records  and  first  and  last  pointers  in  the  Master  record. 

• Primary  linkage  path  - the  linkage  path  referenced  by  the  application 
programmer  in  a specific  call.  The  record  placement  strategy  in  the  Variable 
file  is  "close  to  insert  point  in  primary  linkage  path." 

• Secondary  linkage  path  - all  other  linkage  paths  defined  for  the  accessed 
Variable  record.  Note  that  a linkage  path  may  be  "primary"  one  time  and 
"secondary"  another  depending  on  how  the  application  code  is  written. 

• Coded  record  - a Variable  Entry  file  may  contain  multiple  formats  of  records 
with  each  format  designated  by  a record  code.  The  logical  content  of  each 
format  may  vary,  but  the  physical  length  is  fixed.  All  coded  records  in  a 
Variable  Entry  data  set  must  share  one  linkage  path  but  may  contain 
references  to  different  linkage  paths. 

b.  Data  Definition  Language 

• The  Data  Definition  Language  used  by  TOTAL  is  an  English-like  fixed  format, 
control  card  type  of  language.  It  is  generated  separately  from  the  application 
programs  and  is  bound  to  the  application  program  at  execution  time. 
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c.  Physical  Data  Base  Organization 


• The  data  base  in  the  following  example,  using  TOTAL,  could  be  defined  with 
two  Master  files  together  with  two  Variable  files,  one  Variable  file  containing 
NAME,  EDUC  and  EXPR  as  Coded  Variable  records  and  the  other  Variable  file 
containing  ADDRESS  and  SALARY  also  as  Coded  Variable  records.  This  is 
shown  in  Exhibit  IV-HI. 


• The  use  of  Coded  Variable  records  within  the  Variable  Entry  files  enables  the 
total  number  of  files  to  be  kept  to  a minimum,  but  requires  the  programmer  to 
read  each  of  the  records  in  a chain,  checking  the  Coded  Record  value  to 
identify  each  of  the  unique  NAME,  EDUC  or  EXPR  records. 


• Large  amounts  of  unused  space  in  some  of  the  Variable  file  records  are 
necessary,  because  all  records  within  the  same  Variable  Entry  file  must  be 
fixed  length  and  there  is  a wide  variance  of  requirements  in  different  record 
lengths. 

• Also,  each  Variable  Entry  record  must  contain  within  it  the  full  key  to  each  of 
the  associated  Master  file  records. 


• If  the  data  base  content  was  such  that  there  were  a large  number  of  people 
with  a particular  skill,  the  NAME-EDUC-EXPR  chain  could  become  potentially 
quite  long.  As  the  only  way  to  identify  particular,  say,  EXPR  records  in  the 
Variable  entry  file  is  to  read  all  of  the  NAME,  EDUC  and  EXPR  records  ahead 
of  it  in  the  chain.  An  alternative  data  base  design  could  be  used  that 
separates  the  NAME,  EDUC  and  EXPR  records  each  into  separate  Variable 
Entry  files. 
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EXHIBIT  IV-H1 

TOTAL  DATA  BASE  ORGANIZATION  CONTAINING 
MASTER  FILES  AND  VARIABLE  FILES 


MASTER 

FILES 


VARIABLE 

FILES 


o 


• However,  because  the  NAME  record  would  be  in  a Variable  Entry  file  and 
could  not  therefore  point  directly  to  the  other  Variable  Entry  file  for  the 
related  ADDR  and  SALARY  record,  it  is  necessary  to  define  an  additional 
dummy  Master  file.  This  dummy  Master  file  would  be  a SKILL/NAME  Master 
file  and  the  data  base  structure  would  now  look  as  shown  in  Exhibit  IV-H2, 
which  contains  no  coded  records. 


• This  approach  reduces  the  length  of  the  chains  that  must  be  allowed  to 
identify  specific  NAME,  EDUC  and  EXPR  segments  and  also  avoids  the  wasted 
space  that  was  necessary  with  all  the  coded  records  in  the  one  Variable  Entry 
file  because  of  the  fixed  length  constraint  placed  on  the  record  length  for 
files.  However,  the  number  of  data  sets  has  increased  from  4 to  8,  because 
each  file  must  now  be  in  a separate  data  set. 


• This  approach  reduces  the  serial  linkage  path  length  but  requires  random 
access  for  the  SKILL/NAME  Master  in  order  to  get  the  related  EDUC  and 
EXPR  records.  The  trade-offs  between  the  two  different  approaches  depend 
upon  the  actual  data  base  content  and  usage. 


d.  Data  Manipulation  Language 


• TOTAL  uses  CALL  linkages  to  communicate  between  the  application  program- 
mer and  the  data  management  routines.  Therefore,  TOTAL  data  bases  can  be 
accessed  by  any  language  that  can  issue  a CALL,  specifically  ASSEMBLER, 
COBOL,  PL/ 1 and  FORTRAN. 


• TOTAL'S  Data  Manipulation  Language  consists  of  a variety  of  functions 
involving  reading,  writing,  adding  and  deleting  records  from  Master  files  and 
Variable  entry  files.  These  functions  are  communicated  to  TOTAL  via  a CALL 
and  its  associated  parameters. 
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EXHIBIT  IV-H2 


DATA  BASE  STRUCTURE 
WITH  SKILL/NAME  DUMMY  MASTER  FILE 
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e. 


Performance  Considerations 


9 


• TOTAL  enables  several  different  record  formats  to  be  combined  in  the  one 
Variable  Entry  file,  as  coded  records,  or  allows  each  record  type  to  reside  in  a 
separate  Variable  Entry  data  set. 

• As  shown  in  Exhibit  IV-H2,  in  order  to  access  each  of  the  NAME  records  for 
people  having  a particular  SKILL,  TOTAL  would  first  access  the  SKILL  Master 
file  and  then  the  SKILL/NAME  Variable  Entry  file.  This  will  identify  each  of 
the  NAME  Master  file  records  with  a particular  SKILL  which  can  then  be 
accessed  in  turn. 


• Thus,  to  retrieve  the  first  NAME  record  would  require  two  file  accesses. 
Subsequent  NAME  records  with  the  same  SKILL  will  require  additional 
accesses  to  the  SKILL/NAME  Variable  Entry  file. 


• If  it  was  necessary  to  obtain  the  EDUC  and  EXPR  records  for  a person  with  a 
particular  SKILL,  the  SKILL  Master  file  would  first  be  accessed,  linkage  path 
01  would  be  followed  to  the  SKILL/NAME  Variable  Entry  file  which  would  then 
be  searched  (using  FINDX  for  example)  to  identify  the  particular  NAME.  Then 
linkage  path  2 would  be  followed  to  the  SKILL/NAME  Master  file.  This  access 
would  be  followed  by  another  access  on  linkage  path  4 to  the  related  EDUC 
record  and  linkage  path  5 to  the  related  EXPR  records  in  the  corresponding 
Variable  Entry  files. 


• Consequently,  the  minimum  number  of  file  accesses  to  retrieve  NAME,  EDUC 
and  EXPR  information  for  the  first  person  with  a particular  SKILL  would 
involve  one  file  access  to  the  SKILL/NAME  Variable  Entry  file  and  then  four 
accesses,  one  to  the  related  Master  file,  one  to  the  related  SKILL/NAME 
Master  file  and  then  one  each  to  the  related  EDUC  and  EXPR  Variable  Entry 
files. 


• This  results  in  a total  of  five  file  accesses.  Additional  EDUC  and  EXPR 
records  for  a particular  person  would  require  further  accesses. 

9 
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• If  it  was  necessary  to  identify  a particular  person  by  NAME  with  a particular 
SKILL,  the  SKILL  Master  file  would  be  used  to  access  the  SKILL/NAME 
Variable  Entry  file  and  a FINDX  operation  would  search  that  Variable  Entry 
file  until  the  required  NAME  record  is  identified.  At  this  point  access  can 
then  be  made  to  the  SKILL/NAME  Master  file  and  the  associated  EDUC  and 
EXPR  Variable  Entry  files  for  that  NAME. 

• TOTAL  provides  only  one  randomizing  algorithm  to  locate  a Master  file  record 
from  a symbolic  key.  The  inability  to  modify  the  randomizing  algorithm  may 
make  it  difficult  to  group  highly  active  records  close  together,  if  their  keys 
are  significantly  different. 

• The  buffer  management  scheme  used  by  TOTAL  consists  of  a single  I/O  area 
for  each  data  set.  If  the  requested  record  is  in  the  I/O  area,  it  is  not  re-read. 
When  an  update  is  performed,  the  block  is  not  written  until  the  area  is 
required  to  read  a new  block. 

• The  need  for  an  I/O  area  per  file  could  have  a possible  effect  on  the  main 
storage  requirements  for  a large  data  base.  To  cope  with  this,  TOTAL  offers 
the  ability  to  share  I/O  areas  between  like  data  sets  (that  is,  two  or  more 
Variable  Entry  files  or  two  or  more  Master  files,  but  not  a Variable  Entry  and 
a Master).  Depending  upon  the  interactivity  of  the  particular  Variable  Entry 
or  Master  files  sharing  the  same  I/O  area,  this  may  or  may  not  have  an  effect 
on  performance. 

• TOTAL  is  able  very  rapidly  to  access  a record  in  a data  base.  CINCOM  quotes 
an  average  of  l.l  seeks  necessary  to  access  a specific  Master  file  record. 
Following  this,  the  number  of  accesses  necessary  to  retrieve  all  of  the  related 
records  depends  (of  course)  upon  the  data  base  structure,  complexity  and 
linkage  path  lengths  and  the  number  of  separate  Master  and  Variable  Entry 
files  required  to  retrieve  that  related  information. 

• While  the  structure  of  TOTAL  enables  very  good  performance  to  be  achieved 
with  a simple  data  base,  as  the  data  base  increases  in  complexity  and 
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additional  Master  and  Variable  Entry  files  are  added,  the  number  of  accesses 
necessary  to  retrieve  all  related  records  therefore  also  increases  propor- 
tionally. This  is  a significant  factor  to  keep  in  mind  in  data  base  design  and 
subsequent  expansion. 

BASIC  FUNCTIONAL  CAPABILITIES 


a.  Easy  Accessibility 

TOTAL  implements  a CALL  level  of  interface  and  therefore  can  be  invoked  by 
any  programming  language  that  supports  a CALL  interface.  This  includes 
ASSEMBLER  as  a machine-oriented  language,  COBOL  and  PL/ 1 as  commercial 
languages  and  PL/ 1 and  FORTRAN  as  scientific  languages. 


A commercial  end  user  language  called  SOCRATES  is  available.  This  English- 
like  report  writer  language  uses  a four-step  operation. 

Logic  evaluation  - the  "COBOL-like"  user  language  is  translated  to 
internal  syntax. 


Extract  - data  is  extracted  from  the  data  base. 


Sort  - records  are  ordered  using  the  system  sort  program. 

Print  - report  preparation. 

The  Data  Communication  support  offered  by  Cincom  for  TOTAL  is 
ENVIRON/ 1 . ENVIRON/ 1 uses  software  controlled  virtual  storage  based  on 
512  byte  pages,  which  allows  its  execution  in  limited  storage.  Specialized 
access  methods  and  software  paging  enable  ENVIRON/ 1 to  offer  good  perfor- 
mance, particularly  on  small  machines. 

Additionally,  a number  of  other  TP  monitors  are  able  to  interface  with 
TOTAL.  These  include  CICS,  Intercom  and  Taskmaster. 
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b. 


Multiple  Views  Of  Data 


• TOTAL  supports  random  access  to  data,  and  provides  high  access  efficiency  to 
a Master  file  record.  A single  randomizing  algorithm  is  used  to  develop  the 
location  of  the  specific  Master  record  relating  to  a key.  This  randomizing 
algorithm  applies  for  all  data  bases  and  records  and  cannot  be  changed. 

• Sequential  access,  based  on  keys,  is  not  supported.  Instead,  TOTAL  supports 
consecutive  access,  which  will  be  sequential  access  provided  that  Master 
records  (loaded  initially  in  a specific  sequence)  are  not  moved. 

• TOTAL  does  not  itself  support  an  index  structure  but  enables  the  user  to 
define,  for  example,  an  ISAM  file  which  can  be  used  to  point  to  related  records 
in  TOTAL  Master  or  Variable  Entry  files. 

c.  Data  Consolidation 

• TOTAL  supports  two  types  of  repeating  groups,  Master  records  and  Variable 
Entry  records.  A Master  record  must  be  an  owner  or  a parent,  while  a 
Variable  Entry  record  must  be  a member  or  dependent  to  a Master.  This 
means  that  only  two  levels  of  repeating  groups  can  be  defined  for  a data  base 
record. 

• TOTAL  supports  a maximum  of  2500  Master  and  Variable  Entry  records  and 
permits  this  number  of  repeating  group  types  to  be  supported  per  data  base 
record.  Within  a repeating  group  type,  there  is  no  limit  to  the  number  of 
occurrences  of  that  type. 

• TOTAL  does  not  support  variable  length  occurrences  of  repeating  group  types. 

In  fact,  while  multiple  repeating  group  types  may  reside  within  the  one 
Variable  Entry  file,  as  coded  records  (i.e.,  with  a uniquely  defining 
record  code),  all  of  those  coded  records  within  the  one  Variable  Entry 
file  must  have  the  same  fixed  length. 
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Alternatively,  these  coded  records  can  reside  in  their  own  separate 
unique  Variable  Entry  files,  in  which  case  the  fixed  record  length  can  be 
optimized  to  each  unique  type. 

There  is  no  limit  to  the  number  of  related  records  per  data  base,  data  base 
record  or  program. 

DATA  INDEPENDENCE 

CINCOM  highlights  the  data  independence  facilities  offered  by  TOTAL,  based 
upon  the  field  level  data  definition  facilities  implemented  in  TOTAL. 

One  of  the  CALL  parameters  in  the  TOTAL  Data  Manipuation  Language  is  a 
"data  element  list."  In  this  list  the  programmer  requests  names  of  fields  or 
field  groups  in  any  sequence  he  desires. 

This  capability,  if  properly  used,  allows  fields  to  be  added,  deleted  or  moved 
within  a record  with  a considerable  degree  of  independence.  The  field  format 
(packed,  character,  binary)  is  not  defined  and,  therefore,  no  translation  occurs. 

The  programmer  defines  the  element  lists  within  his  program  so  he,  in  effect, 
determines  the  level  of  data  independence  his  particular  program  will  have. 

a.  Levels  Of  Mapping 

TOTAL  supports  a single  level  of  Internal  mapping,  whereby  the  application 
program  uses  directly  a physical  data  base  map. 

The  additional  data  independence  offered  by  External  and  Conceptual  maps  is 
not  supported  by  TOTAL,  but  as  described  above,  field  level  definition  is 
supported  and  provides  a significant  level  of  data  independence,  which  can 
offset  the  disadvantage  of  only  one  level  of  mapping. 
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• Since  the  application  program  must  be  aware  of  the  physical  structure  of  the 
data  base  and  the  various  files  and  linkpaths  used,  only  limited  physical  data 
base  restructuring  is  possible  without  significant  program  modification.  Addi- 
tional files  and  new  relationships  can  be  easily  defined,  but  modification  of 
existing  file  relationships  may  require  substantial  program  change. 

• The  programmer  requests  the  specific  elements  to  be  delivered  to  the  program 
by  field  name,  and  TOTAL  extracts  those  fields  from  the  records  retrieved  for 
presentation  to  the  program. 

b.  Data  Base  Changes 

• The  data  independence  offered  by  TOTAL  can  be  assessed  from  the  ability  to 
incorporate  the  following  data  base  changes. 

Change  Device  Type:  A requirement  to  change  the  device  type  only 

requires  reloading  of  the  data  base  on  the  appropriate  device.  No 
recompilation  or  change  of  program  logic  is  necessary. 

Change  Access  Method:  As  TOTAL  uses  a direct  access  method,  it  is 
not  possible  to  change  to  different  operating  system  access  methods. 

Change  Entity  View:  Similarly,  the  ability  to  change  the  program's 

view  of  the  data  base  is  restricted  in  TOTAL,  which  supports  only  a 
random  access  to  the  data  base. 

Add  New  Entity:  The  addition  of  a new  data  base  record  requires  no 
change  in  the  program,  recompilation  or  reloading  of  the  data  base. 

Add  New  Repeating  Group  Type:  The  addition  of  a new  repeating  group 
type  (related  record)  may  require  some  change  to  the  existing  program 
logic,  together  with  recompilation  and  data  base  loading,  if  the  new 
repeating  group  type  becomes  part  of  an  existing  link  path.  However,  if 
the  new  record  resides  on  a completely  new  link  path,  existing  program 
logic  may  not  be  effected. 
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Add  New  Relationship:  The  above  comments  also  apply  to  the  addition 
of  new  record  relationships  (i.e.,  new  record  link  paths). 


Add  New  Field  Repeating  Group:  The  addition  of  the  new  field  to  a 

repeating  group  (record)  does  not  require  any  change  in  program  logic 
or  recompilation,  as  programs  request  only  those  fields  they  require  by 
name.  The  appropriate  file  must,  of  course,  be  reloaded  to  reflect  the 
new  field  in  the  repeating  group  (record). 

Change  Field  Format:  A change  in  field  format  may  require  changed 
program  logic  (particularly  for  ASSEMBLER  programs),  together  with 
recompilation  and  data  base  load. 


Data  independence  should  be  measured  by  the  degree  to  which  the  program  is 
dependent  upon  the  physical  structure  definition.  The  following  list  contains 
the  structural  dependencies  of  TOTAL'S  Data  Manipulation  Language. 
Changes  or  additions  in  these  areas  will  most  likely  cause  a change  to  some 
programs. 


Type  of  file  - the  programmer  must  know  if  the  required  record  is  in  a 
Master  file  or  a Variable  file. 


Interrecord  relationships  - in  order  to  insert  a Variable  Entry  file 
record,  the  programmer  must  know  all  Master  records  that  link  to  this 
record,  because  these  files  must  be  "opened"  and  exclusive  control 
established  by  the  program.  In  order  to  delete  a Master  record,  all 
related  Variable  file  records  must  be  previously  deleted. 


Record  coding  - if  coded  records  are  used  within  a Variable  Entry  file, 
the  program  is  written  differently  than  if  different  record  types  reside 
in  different  Master  and  Variable  Entry  files.  The  application  program 
must  be  fully  aware  of  the  physical  structure  of  the  data  base  and  must 
follow  link  paths  through  that  data  base,  accessing  Master  and  Variable 
Entry  files  to  retrieve  all  related  records  necessary. 
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• This  can  significantly  inhibit  the  ability  of  the  Data  Base  Administrator  to 
change  the  physical  structure  later  without  causing  a significant  amount  of 
program  modification. 

4.  DATA  INTEGRITY 

a.  Exclusive  Control 

• Exclusive  Control  exists  in  TOTAL  at  one  of  two  levels  depending  on  the 
version  of  TOTAL  used  and  how  the  application  programmer  opens  the  data 
set. 

• The  application  programmer  issues  the  CNTRL  operation  at  the  start  of  the 
program  and  RESERVES  each  data  set  for  update. 

A control  record  in  the  data  set  is  marked.  This  control  record  is 
checked  by  any  copy  of  TOTAL  that  opens  this  data  set. 

If  the  data  set  is  already  reserved,  a code  is  returned  to  the  application 
program  which  must  handle  the  situation. 

If  a program  has  reserved  a data  set  ABENDS,  the  data  set  remains 
locked,  thus  preventing  access  to  a potentially  damaged  area.  A utility 
is  provided  to  "unlock"  the  data  set. 

• The  second  level  of  Exclusive  Control  is  at  the  record  level.  It  is  usable  only 
in  a "Central  TOTAL"  version,  which  is  only  available  in  an  OS  environment. 

This  level  is  invoked  by  the  SHARE  option  of  the  CNTRL  operation. 

No  detail  is  available  on  how  this  level  of  Exclusive  Control  works  or  if 
the  areas  of  deadlock  detection,  roll  back,  and  program  isolation  are 
addressed. 
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• While  TOTAL  maintains  Exclusive  Control  on  behalf  of  an  application  pro- 
gram, it  is  the  responsibility  of  the  programmer  to  invoke  the  appropriate 
level  of  control,  and  this  should  be  done  consistently  through  all  application 
programs  in  the  installation. 

5.  RECOVERY/RES  I ART 
a.  Recovery 

• TOTAL  7 supports  logging,  checkpoint  and  recovery  facilities.  The  logging  and 
checkpoint  facilities  are  invoked  and  controlled  by  the  application  program- 
mer, with  recovery  being  achieved  by  a utility  called  RECOVER  7 supplied  by 
Cincom. 


• The  following  operations  are  used  by  the  application  program  to  implement 
checkpoint  and  recovery. 

SINON:  In  the  "SINON"  operation,  the  programmer  selects  the  logging 
option  (before-images,  after-images,  both,  or  none).  The  default  option 
is  "none." 


CNTRL  LOG  QUIET:  This  option  of  the  "CNTRL"  operation  forces  all 
updated  buffers  to  be  written  to  the  data  base  and  creates  a restart 
point  on  the  log  fi  le. 


CNTRL  LOG  MARK:  Allows  user  information  such  as  transactions  to 
be  written  to  the  log. 

• The  RECOVER  7 utility  can  be  used  to  process  the  log  tapes  to: 

Rebuild  a data  base,  applying  after  images  to  a backup  copy. 

Backout  a data  base  to  the  last  "quiet  point"  or  to  the  beginning  of  the 
log  file,  using  the  before-images. 
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• Thus,  logging  is  supported  by  TOTAL,  but  must  be  invoked  by  the  application 
programmer  in  the  SINON  operation  and  is  dependent  upon  the  options 
specified  at  SINON  time  by  the  programmer. 

• No  utility  is  provided  to  recover  log  records  in  storage  from  the  log  buffer  in 
the  event  of  a system  failure,  and  no  facility  is  provided  to  accumulate  log 
activity,  retaining  only  the  most  recent  copy  of  data  base  records  and  sorting 
into  data  base  sequence  for  faster  recovery. 

• Recovery  can  be  carried  out  on  an  individual  file  within  the  data  base. 
However,  TOTAL  does  not  provide  recovery  at  the  track  level. 

b.  Batch  Restart 

• While  TOTAL  accepts  responsibility  for  logging  before-images,  it  is  the 
application  programmer's  responsibility  to  invoke  the  appropriate  level  of 
logging  at  the  start  of  each  program. 

• Backout  support  is  provided  by  the  RECOVER  7 utility  program,  permitting 
backout  to  a previous  "quiet  point"  checkpoint. 

c.  On-Line  Restart 

• TOTAL  and  ENVIRON/ 1 provide  system  support  for  message  logging,  together 
with  system  restart  in  the  event  of  a system  failure.  This  system  restart 
involves  re-establishing  the  on-line  system,  together  with  backout  activity 
against  the  data  base  of  any  partially  processed  on-line  programs  up  to  a 
previous  "quiet  point,"  or  to  the  start  of  the  program. 

• During  on-line  operation,  TOTAL  does  not  permit  a "quiet  point"  (checkpoint) 
to  be  taken  with  processing  in  flight  but  requires  all  processing  to  be 
temporarily  quiesced  before  that  checkpoint  is  taken.  Depending  on  the 
frequency  of  checkpointing,  this  may  have  an  impact  on  on-line  performance. 
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No  facilities  are  provided  by  TOTAL  for  dynamic  task  backout  and  reproces- 
sing in  the  event  of  a program  ABEND  or  failure. 


DATA  SECURITY 


Data  security  is  invoked  by  the  SINCOM  CALL  in  a program.  In  this  CALL, 
the  programmer  can  specify  an  "access"  parameter  with  the  following  options: 

READ  ONLY  - only  read  calls  will  be  allowed. 

UPDATE  - add,  delete,  replace,  read  calls  are  allowed. 


The  selected  access  mode  applies  to  the  entire  DBMOD  (Data  Base  Module) 
and  not  to  the  selected  data  sets.  Since  a program  can  "SINON"  to  one 
DBMOD  only,  the  tendency  is  to  define  the  entire  data  base  in  a single 
DBMOD.  This  means  that  any  program,  once  signed  on  to  a DBMOD,  has 
access  to  the  entire  data  base  and  can,  if  signed  on  for  update  mode,  update 
any  part  of  that  data  base. 


This  may  have  some  implications  in  terms  of  the  ability  to  enforce  a particular 
level  of  security  in  the  installation.  No  facilities  are  provided  by  TOTAL  to 
enable  a Data  Base  Administrator  to  enforce  the  appropriate  level  of  update 
security  independent  of  the  application  programs. 


While  it  can  be  argued  that  it  is  the  programmer  who  knows  best  how  to 
process  the  data  base,  there  are  possible  implications  in  terms  of  the  ability  to 
change  the  data  base  design  at  a later  time  with  minimum  program  main- 
tenance. 
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7.  EASE  OF  USE 

a.  Data  Base  Administrator 

• The  Data  Definition  Language  type  used  by  TOTAL  is  a control  card  type  and 
is  particularly  easy  to  use  for  the  definition  of  new  data  bases.  Because  of  the 
ease  in  data  base  definition,  apparently  no  design  or  measurement  aids  are 
provided. 

• To  assist  in  documentation  and  control,  Cincom  provides  a Data  Dictionary, 
which  enables  the  installation  to  maintain  control  over  all  of  the  data  in  its 
data  base  and  its  use  by  different  application  programs,  systems,  etc. 

• Cincom  provides  facilities  for  the  restructuring  of  data  bases,  together  with 
the  ability  to  reorganize  the  data  base.  Additionally,  dynamic  reorganization 
is  supported  by  TOTAL  for  Master  files. 

• Conversion  aids  are  provided  to  convert  from  IBM's  BOMP  (Bill  of  Material 
Processor)  and  DBOMP  (Data  Base  Organization  Maintenance  Processor)  to 
TOTAL. 

• Good  Externals  education  and  reference  documentation  is  provided  by  Cincom 
for  TOTAL  and  ENVIRON/ 1 . However,  Cincom  only  supplies  the  object  code 
and  not  the  source  code  for  TOTAL.  Consequently,  Internal  logic  education 
and  documentation  is  not  generally  available. 

• The  main  advantage  of  TOTAL  over  other  data  base  systems  is  its  ease  of  use 
by  the  Data  Base  Administrator. 

• However,  the  advantage  this  offers  is  offset  to  some  extent  by  the  additional 
application  program  complexity,  as  the  programmer  must  be  aware  of  the 
physical  data  base  structure  and  must  follow  the  appropriate  linkage  paths 
through  the  various  files  comprising  the  data  base. 
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It  has  other  implications  in  the  ability  of  the  DataBase  Administrator  later  to 
change  the  data  base  design  significantly  if  differing  application  needs  demand 
it. 


b.  Application  Programmer 

The  Data  Manipulation  Language  used  for  TOTAL  programs  is  a CALL  type 
language,  with  28  different  operations  that  may  be  invoked  by  the  application 
programmer. 

Different  commands  are  used  for  the  same  function  depending  upon  the  type 
of  file  being  accessed  (example,  the  command  DELM  is  used  to  delete  a record 
from  a Master  file  while  the  command  DELV  accomplishes  the  same  function 
in  a Variable  Entry  file). 

TOTAL  transfers  a single  record  to  or  from  main  storage  as  a result  of  a single 
TOTAL  command.  This  record  may  be  comprised  of  a number  of  fields  to  be 
extracted  from  the  physical  record.  However,  TOTAL  does  not  have  the 
ability  to  specify  in  one  command  that  selected  fields  from  different  Master 
and  Variable  Entry  files  be  transferred  to  the  program. 

Instead,  the  appropriate  linkage  path  must  be  specified  by  the  application 
programmer  so  that  TOTAL  can  retrieve  the  appropriate  related  record  from 
either  files. 

The  FINDX  command  allows  a programmer  to  search  a single  record  on  a high, 
low  or  equal  basis.  Multiple  fields  within  a record  can  be  searched,  but  the 
same  relational  operator  is  applied  to  all  fields. 

No  support  is  provided  for  Boolean  search  capabilities  or  the  ability  in  one 
command  to  search  on  multiple  record  types  in  different  files. 
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End  User 
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• The  user  language  support  offered  by  Cincom  is  a batch  query  language  called 
SOCRATES.  This  provides  a retrieve  capability  against  data  bases  only,  with 
no  update,  add  or  delete  support. 

• PICS  (Production  Information  Control  System)  has  been  marketed  by  Cincom. 
However,  little  is  known  regarding  other  application  program  support  for 
TOTAL. 

8.  COST/PERFORMANCE 

a.  Measurable  Costs 

• The  costs  for  TOTAL  7 vary  depending  upon  the  degree  of  function  required. 
A full  function  TOTAL  7 System  for  IBM's  DOS/VS  rents  for  $850  per  month 
($34,000  purchase)  while  the  OS/VS  version  rents  for  $1,050  per  month 
(purchase  $39,000).  This  includes  maintenance  charges  together  with  a defined 
amount  of  Systems  Engineer  free  time  to  assist  in  installation. 

• On  the  other  hand,  the  Basic  version  of  TOTAL  for  Interdata  7/32  and  8/32  has 
a purchase  price  of  $13,500,  while  the  central  version  is  $16,500. 

b.  Real  Memory 

• A typical  TOTAL  DOS  System,  including  all  TOTAL  7 modules,  control  blocks, 
I/O  access  methods  and  I/O  areas,  requires  approximately  40K  to  50K  of  real 
storage  while  TOTAL-OS  requires  approximately  50K  to  60K  of  real  storage. 

• As  there  is  no  "Control"  version  of  TOTAL  under  DOS,  a copy  of  TOTAL  must 
reside  in  each  user  partition.  Each  additional  partition  also  requires  an  extra 
40K  to  50K  real  storage  in  addition  to  that  required  for  application  programs. 

• In  the  OS  environment,  one  copy  of  the  "control"  version  of  TOTAL  will 
service  multiple  partitions  or  regions.  The  additional  real  storage  required  in 
each  region  is  not  known. 
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c. 


Performance  Constraints 


TOTAL  supports  multi-thread  access  of  the  data  base  for  improved  perfor- 
mance. 

The  level  of  lockout  (for  Exclusive  Control  purposes)  is  at  the  level  of  the  file 
(for  DOS)  to  provide  update  protection  across  partitions.  It  is  at  the  level  of 
the  repeating  group  (Master  or  Variable  Entry  file  record)  occurrence  for 
TOTAL-DOS  to  provide  update  protection  within  a partition. 

TOTAL-OS,  with  the  "Control"  version,  supports  Exclusive  Control  at  the 
repeating  group  (Master  or  Variable  Entry  file  record)  occurrence  for  protec- 
tion both  between  partitions  and  within  a partition  against  concurrent  update 
in  a multi-thread  environment. 

Under  TOTAL  each  repeating  group  type  (file)  is  assigned  its  own  I/O  area. 
However,  in  order  to  reduce  the  real  storage  requirements,  I/O  areas  can  be 
shared  between  like-file  types. 

Each  repeating  group  type  can  either  be  assigned  a different  data  set,  or 
several  repeating  group  types  (records)  can  reside  within  a Variable  entry  file 
as  coded  records. 

The  first  approach  requires  the  application  programmer  to  follow  the 
appropriate  link  path  to  each  related  file. 

The  second  approach  requires  the  application  programmer  to  test  the 
appropriate  record  code  in  following  a link  through  a particular  Variable 
Entry  file. 

Relationships  between  Master  and  Variable  Entry  files  are  direct,  based  upon  a 
relative  record  number  pointer  in  the  Master  file.  However,  relationships 
from  Variable  Entry  files  to  Master  files  are  indirect,  based  upon  the  symbolic 
key  of  the  Master  file,  which  is  stored  in  the  Variable  Entry  file. 
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This  is  used  as  the  pointer  from  the  Variable  Entry  file  to  the  Master 
file  and  has  the  advantage  that  records  in  the  Master  file  can  move  for 
dynamic  reorganization  as  controlled  by  TOTAL,  without  requiring 
modification  of  pointers  (symbolic  keys)  in  the  Variable  Entry  file. 

Off-setting  this  advantage  is  the  additional  storage  requirement  to 
maintain  symbolic  keys  in  each  record  in  the  Variable  Entry  file.  This 
can  represent  a significant  amount  of  disk  storage  space  if  large  keys 
are  required  by  the  application. 

• TOTAL  uses  a random  access  method  for  good  performance  in  the  case  of 
simple  data  bases.  TOTAL  can  achieve  an  access  to  a record  with  an  average 
of  I . I seeks. 

Provided  the  data  base  does  not  require  extended  following  of  link  paths 
to  several  related  files,  the  performance  can  be  very  good. 

However,  as  the  complexity  of  the  data  base  increases  and  more  files 
are  added  to  contain  new  data,  the  number  of  accesses  necessary  to 
retrieve  all  related  records  in  the  various  files  can  increase  signifi- 
cantly. 

• To  some  extent,  these  additional  accesses  may  be  controlled  through  the  use 
of  coded  records  within  a Variable  Entry  file. 

Thus,  if  the  data  base  and  application  requirements  enable  different 
record  types  to  reside  within  the  one  Variable  Entry  file  as  different 
coded  records,  several  logical  records  may  be  retained  within  a physical 
block  to  reduce  the  number  of  I/O  accesses  necessary. 

However,  the  fixed  length  restriction  of  all  records  within  a Variable 
Entry  file  may  result  in  unused  disk  space  if  the  several  record  types 
exhibit  widely  differing  record  lengths. 
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Subsets  of  TOTAL,  offering  limited  function  compared  with  TOTAL  7 and 
which  have  reduced  memory  requirements  and  reduced  instruction  path  length 
for  improved  performance,  can  be  used. 


DISTRIBUTED  PROCESSING 


Cincom  has  indicated  an  intention  to  support  the  IBM  3790  Communications 
System  through  ENVIRON/ 1 . 

However,  it  is  questionable  whether  they  can  provide  the  level  of  3790 
support  offered  by  IBM  with  CICS/VS  or  IMS/VS  because  of  the 
software  and  hardware  interrelationships  between  the  3790  and 
CICS/VS  or  IMS/VS. 


It  is  anticipated  that  ENVIRON/I  will  be  able  to  communicate  with  the  3790, 
but  the  extent  to  which  it  will  participate  with  the  3790  in  message 
resynchronization  and  recovery  following  a system  failure  is  not  known. 

Also,  it  is  not  known  whether  the  ENVIRON/ 1 -3790  support  will  enable 
existing  3270-based  applications  to  migrate  without  change  to  the  3790 
and  process  using  3270s  as  is  provided  by  CICS/VS  and  IMS/VS. 


It  is  not  known  whether  ENVIRON/ 1 -3270  support  will  also  permit 
concurrent  processing  of  3790  applications  locally,  together  with  con- 
current host  communication  for  3270-compatible  applications,  batch 
data  transmission  and  RJE  processing,  as  is  provided  by  CICS/VS  and 
IMS/VS  support  for  3790. 

DISTRIBUTED  DATA  BASE 


TOTAL  provides  Distributed  Data  Base  support  through  its  ability  to  be  used 
on  many  different  CPUs  such  as  Digital  Equipment  and  Varian  minicomputers, 
and  IBM  System/3.  However,  the  support  offered  is  for  standalone  operation. 
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• Full  effective  Distributed  Data  Base  also  requires  the  ability  to  communicate 
with  a remote  host  CPU  - that  is,  Distributed  Processing. 

• ENVIRON/ 1 at  present  does  not  support  Distributed  Processing  to  the  extent 
offered  by  IBM.  However,  the  Distributed  Data  Base  support  offered  by  IBM  is 
only  for  large  distributed  System/370s  (Models  148-168  through  DL/I  and  the 
IMS/VS  Multiple  Systems  Coupling  Feature).  The  data  base  support  offered  by 
IBM  for  its  minicomputers  (360U,  3650,  3790,  37/0,  etc.)  is  not  DL/I. 
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V 


DATA  ANALYSIS  - A NEW  METHOD  OF  STRUCTURING  DATA 


A.  THE  IMPORTANCE  OF  DATA  ANALYSIS 


• Many  organizations  implementing  data  base  applications  have  used  the  DBMS 
as  a sophisticated  data  management  access  method,  rather  than  utilize  the  full 
potential  offered  by  data  base.  These  organizations  have  not  designed  their 
applications  for  data  base  - instead  they  have  abandoned  their  applications  to 
data  base. 

• Many  data  base  applications  have  been  developed  using  as  a basis  the  record 
formats  used  in  previous,  traditional,  data  management  files.  This  limits  the 
effective  use  of  data  base.  Such  data  files  generally  have  been  developed  to 
satisfy  the  specific  requirements  of  a particular  program  or  group  of 
programs,  rather  than  reflect  the  application  information  requirements  of  the 
organization. 

• Data  analysis  methodology  is  the  key  to  the  successful  use  of  the  data  base 
concept.  It  enables  an  organization  to: 

Express  its  information  requirements  in  terms  of  "data  structures," 
which  can  subsequently  be  implemented  as  normal  records  with 
traditional  access  methods  or  with  a DBMS. 
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Design  data  structures  that  are  most  likely  to  survive  both  future 
application  changes  and  changes  in  data  base  software  technology, 
while  minimizing  the  amount  of  program  modification  necessary. 


o 


B.  OVERVIEW  OF  MODELS 


• Data  Analysis  results  in  the  development  of  three  types  of  data  models: 
Relational. 


Structural. 


Physical. 


• A Relational  Model  considers  the  information  requirements  of  an  organization 
first  and  applies  relational  theory  to  the  definition  of  data  relationship.  These 
relationships  are  considered  as  a series  of  tables  of  information,  related  by 
common  "keys." 


• A Structural  Model  uses  the  Relational  Model  as  a basis  and  factors  the  access 
requirement  and  frequency  of  different  applications  against  the  data  contained 
in  the  relational  models.  A Structural  Model  developed  in  this  fashion  can  be 
subsequently  implemented  using  either  traditional  data  management 
techniques  or  data  base  techniques. 

• A Physical  Model  considers  the  implementation  of  the  defined  Structural 
Model  based  on  a particular  data  base  management  system  (DBMS).  It  is  this 
approach  that  considers  the  various  characteristics  of  the  particular  DBMS 
products  being  used  and  enables  a physical  data  base  to  be  designed  according 
to  those  product  characteristics. 
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• Data  base  design,  in  the  past,  has  generally  only  considered  the  last  of  the 
above  three  states  - the  Physical  Model.  Data  analysis  permits  the  definition 
of  Relational  Models  and  Structural  Models  considering  only  the  information 
requirements  of  an  organization  and  without  being  concerned  about  the 
physical  constraints  of  a particular  data  base  management  system  product. 

• The  Data  Analysis  methodology,  while  generally  used  by  an  organization's 
systems  analysts  and  data  base  administrators,  can  also  be  applied  by  the 
organization's  user  department  personnel. 

User  department  personnel  may  not  have  any  particular  computer 
experience  but  instead  have  very  good  knowledge  of  the  applications 
being  implemented. 

This  application  experience,  together  with  the  Data  Analysis 
methodology,  enables  user  department  personnel  to  apply  their 
knowledge  to  the  design  of  data  structures  that  can  subsequently  be 
used  by  the  Data  Base  Administrator  to  design  the  Physical  Model  of  an 
application  data  base. 

C.  RELATIONAL  MODEL 


• The  starting  point  for  the  definition  of  a Relational  Model  is  the  consideration 
of  those  business  rules  that  apply  to  a specific  application  in  the  organization. 

• A formal  statement  of  the  business  rules  indicates  the  way  in  which  the 
organization  carries  out  the  processing  associated  with  a application. 

While  such  rules  could  change  in  the  future,  they  are  generally 
fundamental  to  the  way  in  which  that  organization  carries  out  business, 
and  hence  are  unlikely  to  change  frequently. 
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Business  rule  changes  generally  involve  fundamental  changes  in  the 
company  policy. 

The  business  rules  reflect  management's  statement  of  the  way  in  which  the 
application  will  be  carried  out.  This  management  statement  is  used  in 
conjunction  with  a detailed  analysis  of  the  data  input  and  output  reporting 
requirements  of  the  application. 

As  an  example,  Exhibit  V- I A represents  some  of  the  typical  rules  which  would 
apply  in  an  organization  with  a Personnel  and  Project  Costing  application. 

From  examination  of  the  existing  system  and  from  discussions  with 
users,  it  becomes  clear  that  details  of  name,  address,  dependents,  and 
salary  are  kept  for  each  employee,  together  with  the  department  each 
works  for  and  the  project  each  is  working  on. 


Source  documents,  such  as  a personnel  form,  identify  the  data  input 
that  is  provided  for  the  application. 


The  output  requirements  of  the  application  are  similarly  identified  in 
various  documents,  such  as  the  typical  Employee  Report  shown  in 
Exhibit  V-IB  and  the  typical  Project  Report  shown  in  Exhibit  V-IC. 


Data  analysis  involves  the  identification  of  all  data  fields  required  by  the 
application,  together  with  an  indication  of  whether  that  data  is  required  on 
input  or  on  output. 

The  data  fields  so  defined  are  documented  in  a Data  Dictionary,  which  ensures 
that  each  field  is  uniquely  named  and  identifies  the  various  input  documents  or 
reports  in  which  that  field  is  involved. 


Exhibit  V-2  shows  a typical  example  of  a simple  manual  Data 
Dictionary. 


- 298.1  - 

© 1979  by  INPUT,  Palo  Alto,  CA  94303.  Reproduction  Prohibited. 


INPUT 


c 


EXHIBIT  V-1 


PERSONNEL  AND  PROJECT  COSTING 
APPLICATION  - AN  EXAMPLE 


BUSINESS  RULES 


• EACH  PROJECT  CAN  CONSIST  OF  MORE  THAN  ONE  TASK. 

• TASKS  ARE  NUMBERED  UNIQUELY  ONLY  WITHIN  EACH  PROJECT. 

•'  THE  CHARGE  RATE  FOR  EACH  EMPLOYEE  IS  DERIVED  FROM 
HIS  EMPLOYEE  NUMBER  AND  THE  DEPARTMENT  FOR  WHICH 
HE  WORKS. 


B. 


EMPLOYEE 

REPORT 

EMPLOYEE 

NO. 

PROJECT 

NO. 

EMPLOYEE 

NAME 

DEPEN- 

DENTS 

DEPART- 

MENT 

DW 

LOCATION 

427 

31 

JONAS  D.W . 

JONAS  P.C. 

DRAFTING 

BLDG.  16 

513 

58 

ALLEN  K.V. 

ALLEN  C. 

SURVEY 

BLDG.  29 

695 

86 

LYONS  M.B. 

ELECTRI- 
CAL EN- 
GINEERING 

BLOCK  3A 

c 


c. 


PROJECT  REPORT 

PROJECT  NO. 

DEPARTMENT 

NO. 

CHARGE 

RATE 

TASK 

58 

16 

$14.00 

916 

58 

16 

$16.00 

292 

58 

16 

$12.50 

916 

86 

16 

$16.00 

165 

86 

22 

$11.00 

827 

- 299  - 

© 1979  by  INPUT,  Palo  Alto,  CA  94303.  Reproduction  Prohibited. 


INPUT 


EXHIBIT  V-2 


SIMPLE  MANUAL  DATA  DICTIONARY 


EMPLOYEE  DATA 

FIELD  NAME 

DESCRIPTION 

EMPLOYEE  # 

EMPLOYEE-NO 

6N 

NAME 

EMPLOYEE-NAME 

15A 

ADDRESS 

ADDRESS 

40X 

DEPENDENT  (MORE  THAN  ONE 

DEPENDENTS 

10A  xn 

PER  EMPLOYEE) 
SALARY 

SALARY 

6N 

SKILL  CLASSIFICATION 

SKILL 

5X 

CHARGE  RATE 

CHARGE-RATE 

4N 

DEPARTMENT  # 

DEPT-NO 

4N 

DEPARTMENT  NAME 

DEPT-NAME 

1 OA 

DEPARTMENT  LOCATION 

DEPT-LOCATION 

3N 

PROJECT  # 

PROJECT-NO 

5N 

TASK  # 

TASK-NO 

5N 

PROJECT  DATA 

FIELD  NAME 

DESCRIPTION 

PROJECT  # 

PROJECT-NO 

5N 

PROJECT  START  DATE 

PROJECT-START 

8N 

PROJECT  SCHEDULED  FINISH 
DATE 

PROJECT-FINISH 

8N 

PROJECT  BUDGET 

PROJECT-BUDGET 

8N 

TASK  # (MORE  THAN  ONE 
PER  PROJECT) 

TASK-NO 

5N  xn 

TASK  NAME 

TASK-NAME 

15A  xn 

TASK  MAN-HOURS  ESTIMATE 

TASK-MANHRS 

5N  xn 

N-NUMERIC  A=ALPHABETIC  X=ALPHANUMERIC 
xn=NUMBER  OF  OCCURENCES 
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The  various  data  fields  identified  in  this  way  and  documented  in  the  Data 
Dictionary  are  expressed  in  terms  of  a number  of  relations. 


A relation  can  be  considered  as  a 2-dimensional  table,  with  the 
members  of  the  set  forming  the  rows  of  the  table,  and  each  member's 
characteristics  forming  the  columns  in  that  table. 

An  identifying  field  or  column  (such  as  Employee  number)  is  defined  as 
the  "key"  to  access  the  table,  as  shown  in  Exhibit  V-3A. 

• This  table  can  be  expressed  in  a short-hand  way  with  the  name  of  the  table 
becoming  the  name  of  a relation  and  the  columns  of  the  table  becoming 
columns  (or  "domains")  of  the  relation.  The  key  field  is  identified  by 
underlining,  as  illustrated  in  Exhibit  V-3B. 

• In  a similar  way,  all  of  the  information  requirements  of  the  application,  as 
expressed  in  the  various  input  and  output  documents,  are  expressed  as 
relations. 

• Relational  theory  then  examines  the  content  of  each  relation  to  ensure  that 
only  fields  that  are  uniquely  identified  by  the  appropriate  keys  are  contained 
within  those  relations.  Fields  that  are  not  fully  dependent  on  a key  should  be 
separated  into  different  relations.  The  process  of  decomposing  the  initial 
relations  in  this  way  is  called  "Normalization." 

• Normalization  is  a formal  approach  to  the  analysis  of  data.  It  provides  a 
methodology  to  examine  the  data  fields  in  relations,  separating  out  those 
fields  of  like  nature  into  different,  separate  relations.  Normalization  proceeds 
through  three  stages,  called  First  Normal  Form,  Second  Normal  Form  and 
Third  Normal  Form. 

• It  is  not  until  relations  have  been  decomposed  into  Third  Normal  Form  that  a 
certain  amount  of  stability  can  be  achieved  in  the  resulting  data  structure. 
For  example,  a relation  in  First  Normal  Form  contains  fields  that  may 
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EXHIBIT  V-3 

NORMALIZATION  OF  RELATIONS  IN  THE  PERSONNEL 
AND  PROJECT  COSTING  EXAMPLE 


O 


A. 


B. 


SHORTHAND  EXPRESSION 

EMPLOYEE 

(EMPLOYEE  #,  EMPLOYEE  NAME,  ADDRESS,  DEPENDENTS.  SALARY.  SKI  1 1 
CHARGE  RATE,  DEPARTMENT  #,  DEPARTMENT  NAME,  DEPARTMENT  LO- 
CATION, PROJECT  #,  TASK  #) 

C. 


FIRST  NORMAL  FORM 

EMPLOYEE 

DEPENDENT 

EMPLOYEE-TASK 

PROJECT 

TASK 

(EMPLOYEE  #.  EMPLOYEE  NAME.  ADDRESS.  SALARY.  SKI  LL.  CHARGE  RATE. 
DEPARTMENT  #,  DEPARTMENT  NAME,  DEPARTMENT  LOCATION) 

(EMPLOYEE  #.  DEPENDENT) 

(EMPLOYEE  #.  PROJECT  #.  TASK  #) 

(PROJECT  #,  START  DATE.  FINISH  DATE.  BUDGET) 

(PROJECT  #,  TASK  #,  TASK  NAME,  ESTIMATE) 

D. 


SECOND  NORMAL  FORM 

EMPLOYEE 

DEPENDENT 

EMPLOYEE-TASK 

CHARGE 

PROJECT 

TASK 

(EMPLOYEE  #.  EMPLOYEE  NAME.  ADDRESS.  SALARY.  SKI  LL.  DEPART- 
MENT  #,  DEPARTMENT  NAME,  DEPARTMENT  LOCATION) 

(EMPLOYEE  #,  DEPENDENT) 

(EMPLOYEE  #,  PROJECT  #,  TASK  #) 

(DEPARTMENT  #.  EMPLOYEE  #,  CHARGE  RATE) 

(PROJECT  #.  START  DATE.  FINISH  DATE.  BUDGET) 

(PROJECT  #.  TASK  #.  TASK  NAME,  ESTIMATE) 

E. 


THIRD  NORMAL  FORM 

EMPLOYEE 

DEPARTMENT 

DEPENDENT 

EMPLOYEE-TASK 

CHARGE 

PROJECT 

TASK 

(EMPLOYEE  #,  EMPLOYEE  NAME.  ADDRESS.  SAI  ARY  RKII  1 DFPARTMEMT  Jtl 
(DEPARTMENT  #.  DEPARTMENT  NAME.  DEPARTMENT  1 OCATIONl 
(EMPLOYEE  #.  DEPENDENT) 

(EMPLOYEE  #.  PROJECT  #.  TASK#) 

(DEPARTMENT  #.  EMPLOYEE  #.  CHARGE  RATF) 

(PROJECT  #.  START  DATE.  FINISH  DATF  RlinGFTI 
(PROJECT  #.  TASK  #.  TASK  NAMF.  FSTIMATF ) 
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subsequently  change,  and  in  so  changing  may  affect  other  fields  in  the 
relation. 

• Returning  to  the  EMPLOYEE  relation,  we  find  from  the  Data  Dictionary  that 
it  is  three-dimensional,  with  one  employee  possibly  having  several  dependents. 
First  Normal  Form  requires  that  "repeating  groups,"  such  a Dependents  and 
Task  details,  be  separated  out  into  new  relations,  as  shown  in  Exhibit  V-3C. 

• First  Normal  Form  is  further  decomposed  into  Second  Normal  Form  by 
removing  those  fields  from  each  relation  that  are  only  partially  dependent  on 
the  key  of  the  relation  and  placing  them  in  another  relation  where  they  are 
fully  dependent  on  the  key  of  that  relation. 

• For  example,  "charge  rate"  in  the  EMPLOYEE  relation  is  dependent  on 
employee  number  but  also  on  department  number.  That  is,  it  is  only  partially 
dependent  on  employee  number  and  is  therefore  moved  into  a new  relation 
CHARGE,  where  it  is  fully  dependent  on  the  compound  key  of  Department 
number  and  Employee  number,  as  shown  in  Exhibit  V-3D. 

• Third  Normal  Form  is  then  derived  by  removing  those  fields  in  a relation  which 
are  dependent  not  on  the  key  of  that  relation  but  on  other  fields  in  the 
relation.  For  example,  in  the  EMPLOYEE  relation,  Department  location  and 
Department  name  are  not  dependent  on  Employee  number  but  are  dependent 
on  Department  number.  They  are  said  to  be  "Transitively  Dependent"  on 
Employee  number  - not  fully  dependent  - and  so  are  moved  to  another  relation, 
DEPARTMENT.  All  of  the  relations  are  now  in  Third  Normal  Form,  as  shown 
in  Exhibit  V-3E. 

• In  the  process  of  normalization,  inconsistent  data  have  been  removed  from 
their  original  relations  and  form  new  relations  that  reflect  the  commonality  of 
data. 
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Such  normalized  relations  are  more  stable  that  the  original 
unnormalized  relations  and  are  less  likely  to  be  affected  by  application 
changes. 

Normalization  identifies  inter-relationships  (through  common  keys)  and 
ensures  consistency  of  data.  It  is  not  concerned  with  the  physical 
implementation  of  this  data. 


D.  STRUCTURAL  MODEL 


The  Relational  Model  described  above  considers  the  decomposition  of  relations 
into  Third  Normal  Form.  The  process  of  normalization  involves  the  definition 
of  additional  relations.  Some  of  these  relations  may  subsequently  be 
consolidated  with  other  relations,  perhaps  moving  away  from  pure  Third 
Normal  Form,  because  of  considerations  such  as  access  requirements  and 
physical  limitations. 

Given  the  various  relations  defined  in  the  Relational  Model,  these  relations  are 
examined  in  terms  of  the  various  reports  or  queries  required  by  the 
application.  Such  reports  or  queries  identify  the  various  access  paths  through 
the  different  relations. 

Typical  application  queries  processed  to  give  specific  reports  and  the 
relation  access  paths  that  are  referenced  are  diagrammed  in  Exhibit  V- 
4. 


These  different  queries  reflect  the  various  relational  views  of  the  application 
and  the  logical  accessing  and  processing  of  data  by  application  programs. 
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EXHIBIT  V-4 


TYPICAL  QUERY  ACCESS  PATHS 


A. 


B. 


C. 
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• The  various  relational  views  are  combined  into  a summary  Relational  Map, 
with  the  frequency  of  occurrence  of  each  relation  recorded  by  the  side  of  each 
relation  block  in  the  diagram,  based  on  the  statement  of  the  business  rules. 
An  example  is  shown  in  Exhibit  V-5. 

• Having  identified  various  relation  frequencies  in  this  way,  the  number  of 
relation  accesses  through  a relational  view  are  identified. 

• Each  application  query  or  report  will  need  to  reference  different  relations  in 
the  Relational  Map,  to  extract  the  necessary  information  to  satisfy  the  report 
or  query.  These  references  identify  "access  paths"  through  the  Relational 
Map. 


Such  access  paths  may  traverse  relations  where  no  relationships  exist, 
producing  a Usage  Map. 


• Exhibit  V-6  takes  the  typical  query  access  paths  and  calculates  the  total 
number  of  relations  referenced  by  each  query  (the  Relation  Count  below).  The 
relative  access  loads  which  each  query  will  contribute  to  the  overall  applica- 
tion performance  are  identified.  Multiplied  by  the  daily  volume  of  each  query, 
the  Daily  Relation  Count  will  then  indicate  the  overall  application 
performance  load. 


• By  analyzing  the  relation  access  of  each  query  in  a similar  way  and  applying 
the  volumes  and  frequency  of  each  query,  the  most  heavily  referenced  access 
paths  and  relations  can  be  identified. 

Such  access  path  analysis  enables  the  identification  of  performance 
critical  queries  in  the  application  (e.g.,  Query  B in  the 
project/employee  example). 
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EXHIBIT  V-5 


EXAMPLE  OF  RELATIONAL  MAP  WITH  FREQUENCY  COUNTS 


THE  PROJECT /EMPLOYEE  EXAMPLE  HAS  HAD  THE  FOLLOWING  INFORMATION 
SUPPLIED  ABOUT  FREQUENCIES: 

• EACH  EMPLOYEE  HAS  AN  AVERAGE  OF  THREE  DEPENDENTS 

• THERE  ARE  TEN  DEPARTMENTS,  AND  500  EMPLOYEES 

• THERE  ARE  TWENTY  PROJECTS  AND  200  TASKS 

• EACH  EMPLOYEE  WORKS  AN  AVERAGE  OF  SIX  TASKS  AT  ANY  ONE  TIME 

• (THERE  IS  ONLY  ONE  CURRENT  CHARGE  RATE  PER  EMPLOYEE  - THIS 

WAS  DEFINED  EARLIER.) 
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EXHIBIT  V-6 


o 


TYPICAL  QUERY  ACCESS  PATHS  WITH  RELATIONAL  COUNTS  APPLIED 


REPORT  FOR  A PARTICULAR  PROJECT  THE  SALARIES  AND  CHARGE  RATES  FOR  ALL 
EMPLOYEES  ON  EACH  TASK  OF  THAT  PROJECT: 


15 


MAXIMUM  RELATIONAL  COUNT:  1 + 10  + (10x15)  + (10x15)  + (10x15)  = 461 

DAILY  QUERY  VOLUME:  2 (ACCORDING  TO  APPLICATION  REQUIREMENTS)  x 2 
DAILY  RELATION  COUNT:  Too 


B. 


COMPARE  THE  CURRENT  ESTIMATES  OF  ALL  TASKS  AGAINST  THE  BUDGET  FOR  THEIR 

ASSOCIATED  PROJECTS: 

PROJECT 

TASK 

20 

10 

MAXIMUM  RELATIONAL  COUNT:  (20x1)  + (20x10) 

= 220 

DAILY  QUERY  VOLUME:  10  (ACCORDING  TO  APPLICATION  REQUIREMENTS)  xIO 
DAILY  RELATION  COUNT:  = 2200 

C. 


FINDTHE  DEPARTMENTS  INVOLVED  ON  A TASK,  AND  THE  CHARGE  RATES  BEING 
APPLIED  FOR  A SPECIFIC  EMPLOYEE  WHO  IS  WORKING  ON  THAT  TASK: 


departmeTTt") 


TASK 

EMPLOYEE- 

EMPLOY- 

< 

TASK 

EE 

1 

15 

15 

CHARGE 

1 


MAXIMUM  RELATIONAL  COUNT:  1 + 15+15+15+1  =47 

(IF  DEPARTMENT  IS  REACHED  VIA  CHARGE,  THE  MAXIMUM 
RELATIONAL  COUNT  WOULD  BE  1 + 1 5 + 1 5 + 15  + 1 5 =61 

DAILY  QUERY  VOLUME:  10  (ACCORDING  TO  APPLICATION  REQUI REMENTS)  xIO 
DAILY  RELATION  COUNT:  = 47n 
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The  Structural  Model  represents  a consolidation  of  the  various  relation  views 
and  incorporates  this  access  path  frequency.  Based  on  the  various  access 
paths,  candidates  for  consolidation  of  relations  because  of  performance 
requirements  can  be  identified. 

Superimposing  the  various  access  paths,  as  identified  above,  on  the  Relational 
Map,  a Composite  Map  (Exhibit  V-7)  is  produced. 

This  Composite  Map  may  indicate  access  paths  for  which  no  relation- 
ship exists. 

It  also  may  identify  relationships  which  are  not  accessed. 


The  Composite  Map  is  modified  to  satisfy  the  application  requirements  as 
identified  by  the  access  paths,  or  to  meet  possible  future  application  require- 
ments. 

The  development  of  a Structural  Model  following  the  above  approach  at  no 
point  requires  a knowledge  of  the  particular  physical  implementation  for  those 
data  structures.  Instead,  data  are  grouped  into  relations  based  on  the  input 
and  output  documents  and  business  rules  of  the  application. 

The  final  stage  of  development  of  the  Structural  Model,  prior  to  development 
of  the  Physical  Model,  considers  the  various  application  reference  loadings 
introduced  by  accessing  relations  to  extract  information  required  to  satisfy 
application  queries  and  reports. 

Each  query  or  report  contributes  to  the  reference  "load"  on  each  relationship 
which  must  be  traversed  through  the  Composite  Map. 

Queries  A and  C contribute  to  the  load  along  the  relationship  path: 
EMPLOYEE  - TASK  - EMPLOYEE  in  the  Composite  Map. 
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EXHIBIT  V-7 


9 


EXAMPLE  OF  COMPOSITE  MAP  WITH  PATH  LOADINGS 


UNREFERENCED  RELATIONSHIPS:  EMPLOYEE  - DEPENDENT 

DEPARTMENT  - CHARGE 


MISSING  RELATIONSHIPS:  EMPLOYEE  - DEPARTMENT 


KEY  : 

USAGE  MAP:  DIRECTION  OF  ACCESS  PATH 

QUERIES:  <Q>  <^>  <^>  ETC. 

FREQUENCIES:  1,  6,  10,  15,  50  ETC. 

RELATIONSHIP:  ONE  ► MANY 

(NUMBER  OF  OCCURRENCES) 
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Queries  A and  B both  contribute  to  the  load  along  the  relationship  path: 
PROJECT  - TASK. 

• The  load  applied  by  a query  along  a path  is  dependent  on  the  frequency  of 
occurrence  of  all  relations  in  the  path,  as  shown  earlier.  The  Relation  Counts 
determined  for  each  of  queries  A,  B and  C indicate  the  total  number  of 
relations  accessed  to  resolve  each  query. 

• The  total  contribution  of  each  report  or  query  to  the  overall  load  on  the 
Structural  Model  is,  however,  dependent  not  only  on  the  Relation  Count  of 
each  report  or  query,  but  also  on  the  daily  volume  of  each,  as  demanded  by  the 
application. 

• Thus  the  daily  volume  of  each  application  query  or  report  is  multiplied  by  the 
Relation  Count  along  each  path. 

Consolidating  these  total  daily  relation  loads  for  each  path,  the  overall 
daily  path  loadings  can  be  determined  for  all  paths  in  the  Composite 
Map. 

• These  daily  path  loadings  are  then  superimposed  on  the  Composite  Map  at  the 
head  of  each  access  path  arrow.  For  example,  the  load  of  *2020*  represents 
the  daily  load  of  query  A and  B as  follows: 

Query  A daily  load  on  PROJECT  - TASK  access  path  equals  the  TASK 
relation  count  x daily  Query  A volume,  or  10  x 2 = 20. 

Query  B daily  load  on  PROJECT  - TASK  access  path  equals  the  TASK 
relation  count  x daily  Query  B volume,  or  200  x 10  = 2000. 

Total  accesses  for  PROJECT  - TASK  path  is  the  sum  of  A and  B,  or 
*2020*. 
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• Similarly,  the  daily  path  loadings  are  determined  for  ail  other  paths  in  the 
Composite  Map,  incorporating  the  load  on  each  path  contributed  by  all 
application  reports  and  query  references. 

• The  loaded  Composite  Map  so  produced  identifies  those  paths  with  the 
heaviest  reference  (PROJECT  - TASK  access  path  in  Exhibit  V-7).  Subsequent 
physical  data  base  design  can  thus  utilize  various  DBMS  characteristics  to 
minimize  the  potential  I/O  activity,  and  hence  optimize  performance,  for 
application  access  along  that  path. 

• Relations  may  be  consolidated  for  the  same  reasons,  provided  that  such 
consolidation  does  not  compromise  the  data  stability  introduced  by 
normalizing  to  third  normal  form. 

In  the  example  given,  the  CHARGE  relation  can  be  combined  with  the 
EMPLOYEE  relation,  and  so  eliminate  *450*  accesses. 

• Modifications  to  the  loaded  Composite  Map  for  performance  reasons  may 
require  reiteration  through  normalizing,  the  Relational  Map,  the  Composite 
Map  and  then  the  loaded  Composite  Map.  Such  reiteration  produces  a Final 
Loaded  Composite  Map,  which  reflects  logical  data  consolidation  and  design 
decisions  taken  during  the  data  analysis. 

• The  Final  Loaded  Composite  Map  is  the  Structural  Model  which  is  used  as 
input  to  physical  data  base  design. 

Alternatively,  this  Structural  Model  can  be  used  as  input  to  physical  file 
design,  using  standard  data  management  techniques. 
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• Data  analysis,  using  the  methodology  outlined  in  the  preceding  pages,  is  based 
on  knowledge  of  the  application  procedures,  business  rules  and  requirements. 
The  resulting  logical  data  structure  (the  Structural  Model)  reflects  more 
closely  company  policy,  and  provides  more  stable  data  structures  than  have 
been  achieved  by  using  conventional  file  design  techniques  applied  to  data  base 
design. 

• Data  analysis  is  able  to  be  used  not  only  by  systems  analysts  and  data  base 
administrators,  but  also  by  user  department  personnel  who  act  as  liaison 
officers  with  the  DP  Department. 

• Data  analysis  enables  application  experience  to  be  communicated  to  systems 
analysis  and  data  base  design  in  a standard  format  (the  Structural  Model)  in 
such  a way  as  to  ensure  greater  data  stability,  and  ability  to  accommodate 
new  and  changing  applications  with  minimum  program  modification. 

• The  Structural  Model  is  used  as  the  input  to  physical  data  base  design,  or 
physical  file  design.  The  logical  data  structure  in  the  Structural  Model  can  be 
readily  converted  into  a physical  data  base  design,  using  CODASYL,  Network, 
Hierarchical,  or  Inverted  List  Data  Base  Management  Systems. 

Because  of  its  basis  in  Relational  Theory,  the  Structural  Model  will  also 
permit  ready  migration  to  Relational  Data  Base  systems. 

• The  Physical  Model  now  considers  the  particular  implementation  character- 
istics of  the  DBMS  product  to  be  used  in  designing  the  data  base  to  take  best 
advantage  of  that  product  for  optimum  performance. 

E.  PHYSICAL  MODEL 


• The  Physical  Model  considers  the  implementation  of  the  Structural  Model 
based  on  the  particular  DBMS  product  to  be  used.  In  fact,  the  Physical  Model 
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can  be  implemented  from  the  Structural  Model  using  traditional  data  manage- 
ment access  methods  rather  than  a data  base  management  system  if  required. 

• Considering  the  access  paths  and  frequencies  identified  in  the  Structural 
Model  and  those  relations  identified  as  candidates  for  consolidation  owing  to 
access  performance  considerations,  the  particular  DBMS  product  character- 
istics are  then  used  to  design  a physical  data  base  that  will  offer  optimum 
access  performance. 

• DBMS  characteristics  that  permit  the  data  base  administrator  to  specify  the 
grouping  of  related  information  in  the  same  physical  data  base  record  are 
particularly  useful  in  achieving  good  performance  of  high  access  volume  data 
paths. 

• Generally,  each  relation  in  the  Structural  Model  represents  the  format  of  a 
different  data  base  record  or  segment.  The  data  fields  comprising  each 
relation  then  become  data  fields  within  each  data  base  record  or  segment. 

• The  access  paths  identified  in  the  relation  views  of  each  query  also  identify 
the  relationships  between  various  data  base  records  or  segments. 

• The  different  keys  identified  for  accessing  each  relation  (and  derived  from  the 
various  queries  and  relational  views)  are  used  to  identify  the  need  for  any 
secondary  indexes  in  different  data  base  records  or  segments. 

• The  Structural  Model,  developed  as  outlined  above,  can  be  implemented  as  a 
number  of  different  Physical  Models,  depending  upon  the  particular  DBMS 
product  characteristics. 

An  example  of  a typical  Physical  Model  based  on  a network  type  of 

DBMS  is  shown  in  Exhibit  V-8. 

An  example  based  on  a hierarchical  DBMS  is  shown  in  Exhibit  V-9. 
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EXHIBIT  V-8 


EXAMPLE  OF  PHYSICAL  MODEL  BASED  ON  A NETWORK  DBMS 
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EXHIBIT  V- 9 


o 

EXAMPLE  OF  PHYSICAL  MODEL 
BASED  ON  A HIERARCHICAL  DBMS 
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VI 


FUTURE  DATA  BASE  AND  APPLICATION  DEVELOPMENT  TRENDS 


A.  FUTURE  DATA  BASE  TRENDS 


The  next  five  to  ten  years  will  see  further  emphasis  on  distributed  data  bases, 
first  based  on  the  larger  minicomputers  and  then  migrating  down  to  the  smal- 
ler minicomputers  and  to  microcomputers.  The  first  microcomputer  DBMS 
announcements  have  been  made  already  (Micro-Seed  DBMS  on  Transaction 
Design  Labs  Xitan  Z80  microcomputer  in  New  Jersey). 

The  distributed  data  base  concerns  addressed  in  Chapter  III  will  demand  fur- 
ther development  of  distributed  data  base  techniques,  particularly  in  the  area 
of  security,  integrity  and  deadlock  detection  and  prevention. 

As  the  technology  develops  there  will  be  a wider  distribution  of  processing 
function  than  is  presently  realized. 

The  current  distributed  processing  techniques  are  generally  based  upon 
the  use  of  a centralized  data  base. 

While  some  limited  capability  is  provided  in  a standalone  environment, 
full  processing  flexibility  is  only  realized  on  reference  to  the  central- 
ized data  base. 
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DISTRIBUTED  DATA  BASE  TRENDS 


I. 

• Distributed  data  base  (using  full  DBMS)  will  increase  the  autonomy  of  each 
distributed  location  in  its  ability  to  manage  and  manipulate  its  own  data. 
There  will  still  be  a requirement  for  central  control  of  the  various  distributed 
data  base  nodes,  for  consolidation  of  the  various  remote  data  base  processing 
activity  into  a high-level,  centralized,  organization  data  base. 

• Such  an  organizational  data  base  generally  will  contain  summary  data  rather 
than  the  detailed  operating  data  required  for  the  day-to-day  functioning  of  the 
various  distributed  locations. 

• This  evolution  of  distributed  data  base  will  tend  to  follow  the  typical  manage- 
ment reporting  levels  in  an  organization. 

The  distributed  data  base  systems  will  provide  day-to-day  operating 
data  required  by  first  line  management. 

This  data  will  be  consolidated  and  summarized  for  the  requirements  of 
higher  level  management. 

Eventually  the  key  management  information  required  for  corporate 
decision  making  will  be  available. 

• In  the  first  instance,  distributed  data  base  may  tend  to  off-load  a considerable 
amount  of  data  base  processing  to  the  distributed  locations.  This  will  increase 
the  capability  of  the  central  host  system  to  effectively  consolidate  the  various 
organizational  data. 

2.  DATA  BASE  COMPUTERS  ("BACK-END"  DBMS  PROCESSORS) 

• The  concept  of  a front-end  communications  controller  has  become  an  accepted 
fact  in  data  processing  today.  Such  a front-end  controller  off-loads  much  of 
the  processing  previously  carried  out  by  the  host  system  to  control  the 
transmission  of  data  to  and  from  remote  devices. 
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A similar  development  will  emerge  over  the  next  few  years  towards  the  back- 
end DBMS  controller. 

Current  development  is  under  way  by  Cull  inane,  Cincom,  and  others  in  the 
migration  of  their  DBMS  from  a host  mainframe  to  a minicomputer,  channel- 
connected  to  the  mainframe.  Such  a channel-connected  minicomputer  pro- 
vides a back-end  DBMS  capability. 

Back-end  DBMS  offers  the  advantage  of  uniquely  designed  minicomputers 
capable  of  accesing  data  bases  in  the  billion  byte  size. 

Such  specialized  minicomputers  can  be  microcoded  to  support  a unique 
instruction  set  which  provides  processing  capability  equal  to,  and  in 
some  cases  exceeding,  the  processing  capability  of  large  mainframes 
using  a generalized  instruction  set. 

In  fact,  in  order  to  obtain  the  necessary  performance  characteristics 
such  microcoding  will  be  mandatory. 

The  back-end  system  comprises  the  data  base  management  system  and  sup- 
ports the  data  base  according  to  the  physical  data  base  specifications  indi- 
cated by  the  Data  Definition  Language  (DDL),  as  shown  in  Exhibit  VI-1 . 

The  host  mainframe,  on  the  other  hand,  processes  on-line  and  batch  applica- 
tion programs  which  continue  to  request  data  base  operations  through  the  Data 
Manipulation  Language  (DML).  Such  DML  requests  are  intercepted  by  a DML 
interface  routine  in  the  mainframe  and  then  communicated  across  the  channel 
to  the  back-end  DBMS. 

The  back-end  DBMS  interprets  the  DML  command,  accesses  the  data  base 
using  the  DDL  schema  and  passes  the  requested  data  across  the  channel  to  the 
mainframe.  Subsequent  processing  in  the  mainframe  may  return  that  data  to 
the  minicomputer  for  subsequent  update  of  the  previously  retrieved  data  base 
records. 
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EXHIBIT  VI-1 


3 


DBMS  ORGANIZATION  IN  A DISTRIBUTED 
AND  BACK-END  PROCESSING  ENVIRONMENT 


BACK-END 

PROCESSOR 


HOST 

PROCESSOR 


COMMUNICATIONS 

PROCESSOR 


DISTRIBUTED 

DATA 

PROCESSOR 


9 
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The  logical  interface  between  the  application  program  and  the  DBMS  is 
through  the  Data  Manipulation  Language.  The  DML  in  general  permits  a 
separation  between  the  logical  accessing  of  a data  base  by  application 
programs  and  its  physical  organization  as  defined  through  the  Data  Definition 
Language  and  represented  in  the  schema. 

Through  the  capability  of  microcomputers,  relatively  inexpensive  multiproces- 
sors and  array  processors  will  emerge  for  many  DBMS  functions  that  are  cur- 
rently provided  in  one  system. 

A number  of  benefits  result  from  the  use  of  a back-end  DBMS: 

Improved  Cost/Performance:  The  lower  cost  associated  with 

mini/microcomputers  with  the  added  advantage  of  specially  microcoded 
instruction  sets  which  offer  high  DBMS  performance,  present  a new 
perspective  to  the  cost/performance  of  data  base  management  systems. 
Overhead  associated  with  large  mainframes  can  be  reduced  in  this 
environment.  The  more  processing  that  can  be  off-loaded  to  the  back- 
end processor,  the  greater  are  the  potential  savings  in  the  host 
mainframe  in  terms  of  CPU  time  and  memory. 

Increased  Work  Load:  As  the  application  work  load  introduced  to  a 

mainframe  increases,  the  need  to  maintain  a certain  level  of  response 
may  dictate  the  upgrading  of  that  mainframe  to  a higher  performing 
model  at  great  expense.  Offloading  DBMS  processing  to  a back-end 
processor  may  offset  the  need  for  such  an  upgrade. 

Increased  Storage  Reguirements:  As  application  workloads  increase,  so 
also  may  the  demands  on  the  storage  capability  of  the  host  mainframe. 
Off-loading  of  a DBMS  to  an  attached  back-end  minicomputer  can 
relieve  a high  demand  for  storage  resources  in  the  mainframe  and  free 
this  storage  for  increased  application  processing. 
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Secondary  Storage  Interface:  Various  secondary  storage  devices  can  be 
connected  to  a back-end  processor  which  in  turn  is  capable  of  con- 
nection to  a variety  of  different  host  mainframes. 

• The  need  to  interface  new  storage  technologies  with  each  different  host  main- 
frame is  removed.  Such  interfacing  need  be  done  only  once  for  the  back-end 
minicomputer  and  so  made  available  to  all  of  the  host  mainframes  which  can 
connect  that  DBMS  minicomputer. 

• There  are  some  potential  disadvantages  of  back-end  DBMS: 

Multiple  Vendors:  In  many  cases  the  host  mainframe  and  the  back-end 
minicomputer  may  be  supplied  by  different  vendors  introducing  the 
multiple  vendor  maintenance  problem. 

Obsolescence:  While  a back-end  processor  can  enable  a host  mainframe 
to  take  advantage  of  new  technological  developments,  such  a back-end 
may  constrain  the  ability  of  an  organization  to  move  to  a new 
mainframe  environment  which  is  not  capable  of  connecting  to  the  back- 
end DBMS. 

Reliability:  The  more  individual  components  involved  in  a system,  the 
higher  the  probability  of  failure  of  a component  in  that  system  and 
consequently  the  lower  is  the  mean  time  to  failure  of  the  entire  system. 

Performance:  The  performance  measurement  and  prediction  of  a DBMS 
is  a very  complex  undertaking  based  on  today's  mainframe  DBMS 
products.  The  addition  of  back-end  DBMS  adds  a further  level  of 
complexity  when  measuring  performance  and  tuning  the  DBMS  between 
the  mainframe  and  the  attached  minicomputer  DBMS  processor. 

i A back-end  processor  can  be  a very  large  and  powerful  computer.  Indeed,  for 
very  large  users'  data  base  requirements  such  processors  will  be  used.  The 
main  advantages  to  these  users  are: 
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Reduction  in  overhead  through  separation  of  functions. 

Improvement  in  performance  through  use  of  microcode. 

• Note  should  be  made  that  mainframe  manufacturers  such  as  IBM  can  provide 
the  back-end  processor  applications  processor  and/or  communications  proces- 
sor in  the  same  "box,"  minus  the  external  channels. 

3.  RELATIONAL  DBMS 

• One  of  the  key  advantages  of  data  base  technology  is  that  of  data  indepen- 
dence. Data  independence  in  current  DBMS  products  has  been  implemented  to 
varying  degrees  enabling  application  programs  to  consider  logical  processing 
against  the  data  base  without  concern  for  the  physical  organization  or  access 
to  the  data  base. 

Physical  data  independence  is  well  supported  by  most  of  the  DBMS 
products  available  today.  Through  the  Data  Definition  Language,  the 
physical  organization,  physical  structure  and  access  methods  used  to 
implement  the  data  base  can  be  defined  by  the  Data  Base  Administrator. 

The  capability  of  many  of  today's  DBMS  products  to  offer  logical  data 
independence,  on  the  other  hand,  is  more  limited.  Some  DBMS  products 
today  require  the  application  programs  to  have  a knowledge  of  the 
physical  structure  of  the  data  base  (while  not  necessarily  also  requiring 
a knowledge  of  the  physical  organization  or  access  methods  used). 

• Logical  data  independence  is  only  achieved  when  the  application  program  can 
view  the  data  base  in  a way  best  suited  to  the  application  processing 
requirements  without  any  concern  for  the  physical  structure  of  the  data  base. 
It  is  the  responsibility  of  the  DBMS  to  provide  the  necessary  transformation 
from  this  logical  view  to  the  physical  view. 
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• Recognizing  the  limited  capability  of  current  DBMS  products  to  achieve  this 
logical  data  independence,  Codd  of  IBM  in  1969  proposed  the  use  of  relational 
data  bases  to  overcome  these  limitations. 

• A relational  data  base  considers  that  data  exists  in  the  form  of  two-dimen- 
sional tables.  Such  tables  contain  multiple  columns  (which  can  be  thought  of 
as  similar  to  the  various  fields  in  a record).  The  rows  of  the  table  represent 
the  many  "data  base  records"  making  up  the  entire  data  base.  Access  to  an 
appropriate  row  in  the  table  is  made  on  the  basis  of  a unique  key  value  which 
identifies  the  appropriate  row  in  the  table  of  the  record  exhibiting  that  key 
value. 


• Tabular  forms  of  data  representation  are  natural  to  users  and  relatively  simple 
to  the  data  processing  community.  Hence  the  interest  in  relational  data  base. 

• Relational  theory  permits  the  concise  specification  of  a relational  algebra  and 
relational  calculus  to  indicate  the  accessing  and  manipulation  of  relational 
data  bases. 

• A number  of  operators  have  been  defined  for  the  manipulation  of  relational 
data  base: 

Projection,  which  enables  the  selection  of  requested  columns  to  be 
made  from  a relation  so  forming  several  new  relations. 

Join,  which  is  the  converse  of  projection  and  combines  selected  columns 
from  various  relations  into  a new  relation  such  as  required  to  produce  a 
specific  report  for  example. 


• The  technique  of  data  analysis,  which  was  discussed  previously,  is  enjoying 
increasing  acceptance  as  a prelude  to  data  base  design.  It  is  fundamentally 
based  upon  the  application  of  relational  theory  to  information  analysis  and 
design. 
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Various  data  analysis  methodologies  are  evolving  including  the  use  of  canonical 
structures  (Martin  and  Date)  as  well  as  standard  relational  techniques. 

In  summary,  considerable  research  has  been  carried  out  in  relational  data  base 
technology,  variously  reported  in  the  literature,  but  effective  and  economic 
implementations  of  the  theory  have  yet  to  emerge  for  large  scale  applications. 
Some  people  doubt  that  they  will  emerge  in  the  near  future.  The  benefits, 
then,  of  relational  theory  may  primarily  be  in  its  use  in  Data  Analysis,  as 
discussed  in  Section  V. 

FUTURE  OF  IBM  DATA  BASE  SOFTWARE 

Significant  announcements  made  by  IBM  during  1977  and  1978  featured  dram- 
atic increases  in  hardware  price/performance. 

IBM  303X  systems  have  generally  doubled  cycle  and  storage  capacity 
for  approximately  the  same  dollar  amounts  charged  for  predecessor 
System/370  systems. 

Recent  purchase  price  reductions  for  370/148  and  370/138  models  are 
seen  as  heralding  more  mid-range  system  announcements  before  the  end 
of  1978. 

Accompanying  these  hardware  announcements  are  a series  of  related  software 
developments  including: 

The  apparent  encouragement  of  VM/370  as  a "hypervisor"  running 
CICS/lMS/CMS/APL/Batch/RJE,  providing  an  alternative  to  distributed 
processing. 

With  version  1.2  and  1.3  of  DL/I,  IMS  now  permits  multiple  jobs  to 
properly  update  data  bases. 
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• 1978  should  witness  IBM  ceasing  its  opposition  to  the  current  DDL/DML 
revisions  to  the  original  CODAS YL  Data  Base  Task  Group,  although  it  seems 
probable  that  IBM's  CODASYL  networks  will  await  a new  DBMS. 

• One  speculation  is  that  the  bridge  from  IMS/VS  to  a new  DBMS  will  be  accom- 
plished with  an  interpretive  Data  Dictionary.  "Bridge"  microcode  could  trap 
existing  IMS/VS  calls  and  interface  them  to  the  new  DBMS.  This  capability  is 
said  to  be  under  development  in  two  IBM  laboratories. 

• A new  generalized  user  approach,  Query-By-Example  (QBE)  should  be  an- 
nounced before  the  end  of  1978.  The  calculus  which  gave  rise  to  QBE  is  based 
on  relational  data  base  methods. 

• Two  new  relational  data  base  plans  are  anticipated.  One  is  associated  with  an 
IBM  ORBIT  series  system  called  VENUS,  the  first  of  a new  Communication 
System  series.  The  second  is  associated  with  a system  codenamed  PACIFIC, 
which  is  a new  technology  System/3  successor. 

B.  APPLICATION  DEVELOPMENT  TRENDS 


While  technological  advances  in  hardware  have  permitted  the  development  of 
more  powerful  computers  at  lower  cost,  a similar  improvement  has  not  been 
realized  in  application  development  technology. 

The  productivity  of  application  development  and  maintenance  is  the  single 
most  inhibiting  factor  in  more  effective  utilization  of  computers.  A number 
of  techniques  are  emerging  which  offer  promise  of  improving  this  productivity 
through  greater  automation  of  a number  of  development  and  maintenance 
tasks.  The  first  of  these  is  on-line  application  development. 
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1 . ON-LINE  APPLICATION  DEVELOPMENT  SYSTEMS 

• Application  packages  are  available  which  are  oriented  towards  the  easy 
development  of  on-line  applications.  Generally,  these  system  are  structured 
around  the  definition  of  on-line  terminal  screen  formats,  with  associated 
editing  and  manipulation  of  various  data  fields  on  screen  displays  or  in  data 
file  records. 

An  example  of  such  an  on-line  application  development  system  is 
Display  Management  System  (DMS/VS),  a program  product  supported  by 
IBM. 

• Such  on-line  application  development  systems  generally  involve  the  specifica- 
tion of  screen  formats  using  fill-in-the-blank  coding  techniques.  Provision  is 
made  to  relate  specific  data  fields  on  the  screen  with  the  appropriate  data 
fields  in  disk  files,  either  for  inquiry  or  on-line  update  applications. 

• As  well  as  permitting  access  to  standard  data  files,  some  of  these  systems  also 
enable  access  to  data  base  management  systems.  The  interface  with  the 
DBMS  is  generally  defined  at  a high  level  with  reference  to  appropriate 
records  and  fields  made  by  name  and  not  requiring  a knowledge  necessarily  of 
the  physical  data  base  structure. 

2.  DATA  DICTIONARY/DIRECTORY  SYSTEMS  (DD/DS) 

• Data  Dictionary/Directory  Systems  (DD/DS)  have  been  developed  to  assist  the 
data  processing  department  to  manage  the  data  resources  of  an  organization 
more  effectively.  These  data  resources  are  the  "raw  materials"  of  the  DP 
department. 

Not  only  must  the  DP  department  keep  details  of  the  various  data  fields 
and  records  used  by  the  organization,  but  the  various  application 
programs  and  application  systems  which  utilize  that  data  must  be  noted 
as  well. 
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There  is  a requirement  for  cross-referencing  between  data  resources 
and  application  programs  and  systems. 

From  this  cross-referencing  can  be  derived  "where-used"  information 
which  enables  the  DP  department  to  assess  the  impact  of  a change  in 
specific  data  fields,  for  example,  on  all  of  the  various  application 
programs  which  access  those  fields. 

Initially,  Data  Dictionary  Systems  were  developed  manually  to  standardize  on 
the  descriptive  information  maintained  for  each  data  field  and  record  in  an 
organization.  As  the  complexity  of  applications  increased  and  with  the 
emergence  of  data  base  management  systems,  this  data  dictionary  capability 
has  been  automated. 

There  are  a number  of  DD/DS  currently  available  as  separate  program 
products,  developed  and  marketed  by  DBMS  suppliers  and  independent  soft- 
ware houses  alike.  Some  of  these  DD/DS  are  themselves  based  on  the  use  of  a 
DBMS  for  data  management  and  organization.  Typical  packages  in  this 
category  include  DB/DC  Dictionary  (IBM),  TOTAL  Data  Dictionary  (Cincom 
Systems),  Integrated  Data  Dictionary  (Cull inane),  Control  2000  (MRI  Systems) 
and  UCC  10  (University  Computing  Company). 

DD/DS  which  are  not  dependent  on  a DBMS  include  LEXICON  (Arthur 
Anderson),  Data  Catalogue  (Synergetics),  DATAMANAGER  (MSP)  and 
PRIDE/LOGIK  (M  BRYCE  Associates). 

Data  Dictionary  packages  have  evolved  from  "passive"  packages,  which  re- 
quired specific  input  of  all  information  to  be  maintained  by  the  Data 
Dictionary,  to  integrated  packages,  which  accept  existing  data  definitions  as 
documented  in  application  program  structures  (COBOL  or  PL/I)  or  in  DBMS 
Schema  definitions.  Use  of  program  structures  or  DBMS  schemas  avoids  the 
separate  input  definition  step  required  of  passive  Data  Dictionaries. 
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Similarly,  following  a data  dictionary  change,  integrated  packages  produce  as 
output  the  changed  program  structures  (COBOL  or  PL/I)  or  DBMS  schemas. 

Some  DD/DS  also  provide  an  on-line  interface  supporting  inquiry  and  update  of 
data  dictionary  information. 

TREND  TO  SYSTEM  DICTIONARIES 

The  current  management  of  computer  data  resources  is  fragmented  with  the 
same  data  sometimes  required  to  be  defined  in  many  different  ways. 

Thus,  data  must  be  defined  as  record  formats  and  structures  for  language 
compilers,  as  records  and  files  for  operating  system  control,  as  fields  and 
records  for  data  base  management  systems,  and  as  fields  and  records  for  query 
languages. 

This  multiple  definition  of  the  same  information  is  inefficient  and  a consider- 
able source  of  error.  The  same  data  may  be  defined  in  many  different  places 
in  an  installation.  Changes  to  that  data  must  be  reflected  in  all  of  its 
definitions.  However,  the  amount  of  detail  required  in  each  definition  varies 
and  the  likelihood  of  incomplete  incorporation  of  data  changes  is  high. 

The  present  disjointed  data  resource  control  environment  has  evolved  from 
several  directions  (languages,  operating  systems  and  DBMS).  The  need  is 
already  evident  for  a single  integrated  system  dictionary  which  is  the  central 
repository  of  information  describing  the  various  data  resources  used  by  an 
installation. 

This  system  dictionary  should  be  the  key  central  control  point  of  all 
data  resources  providing  information  required  by  language  compilers, 
operating  systems  and  DBMS. 
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A change  occurring  in  the  definition  or  use  of  a particular  data  field 
would  be  incorporated  in  the  system  dictionary  and  thus  immediately 
available  to  all  system  components  which  access  that  dictionary. 

Such  a system  dictionary  then  becomes  a system  wide  data  base 
information  element,  as  shown  in  Exhibit  VI-2. 

AUTOMATED  APPLICATION  DEVELOPMENT  SYSTEMS 

The  present  on-line  application  development  systems  generally  bypass  the  need 
for  a considerable  amount  of  application  programming  in  the  acceptance  of 
terminal  input,  the  accessing  of  data  file  records  and  manipulation  of  data 
fields,  and  the  preparation  and  display  of  terminal  output  information. 
However,  present  on-line  application  development  techniques  are  not  fully 
integrated  and  generally  require  the  additional  coding  of  application  program 
interfaces  to  permit  full  data  editing,  conditional  logic  and  processing  to  be 
carried  out. 

Automated  application  development  systems  are  emerging  which  are  struc- 
tured around  application  dictionaries.  They  permit  the  on-line  definition  of 
screen  formats,  conditional  logic  and  processing.  Such  application  definition 
results  in  the  automatic  generation  by  the  system  of  the  necessary  application 
programs  to  carry  out  the  specified  screen  and  file  management  with  the 
required  conditional  logic  and  processing. 

Such  an  approach  often  bypasses  the  need  for  detailed  application  coding. 
Instead,  the  systems  analyst  (or  suitably  trained  user  department  personnel) 
can  develop  at  a terminal  the  required  screen  formats  for  an  application.  He 
can  specify  also  the  appropriate  conditional  logic  to  be  applied  to  specific  data 
fields  on  a screen  display  (or  in  a data  file  record),  and  then  indicate  the 
necessary  processing  to  be  carried  out  to  satisfy  the  application  requirements. 

Fundamental  to  such  an  approach  is  an  application  dictionary.  This  application 
dictionary  stores  all  defined  formats,  conditional  logic  and  processing  speci- 


- 322  - 

© 1978  by  INPUT,  Menlo  Park,  CA  94025.  Reproduction  Prohibited. 


INPUT 


c 


EXHIBIT  VI-2 


SYSTEM  DICTIONARY  POSITION 
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fications  for  subsequent  use  in  application  program  generation.  Additionally,  a 
storage  of  these  specifications  enables  applications  maintenance  to  be  carried 
out  rapidly  and  accurately,  using  previous  definitions  stored  in  the  dictionary  - 
and  changing  only  those  necessary  to  effect  the  required  application  modifi- 
cation. 

• Following  application  definition,  or  application  maintenance,  the  various 
screen  input  and  output  formats,  report  formats,  conditional  logic  and 
processing  statements  are  used  by  the  automated  application  development 
system  to  generate  the  necessary  application  programs,  usually  in  a high  level 
language  such  as  COBOL  or  PL/ 1,  as  depicted  in  Exhibit  VI-3. 

• This  approach  bypasses  the  need  for  detailed  application  coding  and  utilizes 
the  capability  of  the  computer  to  translate  application  definitions  into  the 
necessary  programs  to  carry  out  required  processing. 


Such  generated  application  programs  are  executed  by  the  automated 
application  development  system  under  control  of  a monitor.  The 
monitor  provides  much  of  the  common  data  entry  and  editing  function 
for  applications,  with  release  of  completed  batches  of  data  for  subse- 
quent update  processing,  report  preparation,  and  inquiry. 


o 


In  an  on-line  environment  such  application  monitors  also  provide  for 
immediate  on-line  editing  and  update  of  relevant  data  fields. 


In  an  automated  application  development  system  environment,  the  system 
analyst  is  not  concerned  with  the  physical  considerations  of  application  logic 
or  data  storage. 


Application  requirements  are  defined  in  application  terms  rather  than 
computer  terms,  and  data  is  referenced  by  name. 


This  assumes,  therefore,  that  the  automated  application  development 
system  is  able  to  identify  all  data  fields  used  in  an  installation  for 
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EXHIBIT  VI-3 


AUTOMATED  APPLICATION  DEVELOPMENT  SYSTEM 
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application  definition  and  maintenance  purposes,  and  subsequently  to 
access  and  operate  against  that  data  during  actual  production  proces- 
sing. 

This  capability  imples  a highly-integrated  environment  offering  a 
consolidation  of  capabilities  presently  provided  separately  by  on-line 
application  development  systems,  data  dictionaries  and  data  base 
management  systems. 

• As  automated  application  development  systems  evolve,  we  will  see  the 
development  of  an  integrated  application  definition  and  data  dictionary 
system.  This  system  will  contain  interfaces  to  standard  data  management 
files  and  to  the  most  common  data  base  management  systems  in  current  use. 

• This  evolution  will  permit  a significant  productivity  improvement  in  applica- 
tion development  and  maintenance  while  still  providing  an  interface  to 
existing  application  systems  through  current  data  management  files  and  DBMS 
data  bases. 

• As  the  industry  moves  to  a more  productive  development  and  maintenance 
environment,  the  need  for  detailed  management  and  control  of  data  through 
DBMS  products  will  change  in  emphasis.  We  will  therefore  see  greater 
integration  of  the  functions  currently  provided  by  DBMS  products  into  an  all- 
encompassing  application  development  system.  The  computer  will  have 
complete  responsibility  for  the  allocation  of  data  fields  to  appropriate  sections 
of  an  installation-wide  data  base  and  the  retrieval  of  those  data  fields  for 
processing  when  required. 

• Of  course,  such  a generalized  approach  will  require  a great  deal  of  processing 
and  data  storage  capability.  However,  technological  improvements  in  proces- 
sing and  particularly  in  storage  will  permit  rapid  access  to  massive  amounts  of 
data  not  economical  using  today's  technology. 
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APPENDIX:  DATA  BASE  SOFTWARE  EVALUATION  CRITERIA 


DBMS  SELECTION  METHODOLOGY 


1.  Determine  key  requirements  that  a DBMS  must  satisfy  for  development  of  the 
data  base  under  consideration. 

2.  Identify  possible  DBMS  for  consideration,  based  on  factors  such  as  computer 
configuration  (either  current  or  future)  and  DBMS  direction. 

3.  Develop  a suggested  data  base  structure,  showing  how  each  development  stage 
would  be  incorporated  in  the  data  base,  and  assess  the  extent  to  which  the 
data  base  may  be  redesigned  partway  through  development. 

4.  Prioritize  the  DBMS  evaluation  criteria,  based  upon  the  requirements  and  the 
projected  data  base  development  as  determined  above.  Enter  the  priority 
weights  on  the  evaluation  sheets.  (See  following  pages.) 

5.  Evaluate  the  DBMS  products  identified  in  Point  2 above,  using  the  prioritized 
criteria  from  Q4,  and  complete  the  evaluation  sheets. 

6.  Determine  the  DBMS  product  that  best  meets  the  prioritized  criteria. 
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DBMS  EVALUATION  CRITERIA  1 

Basic  Functional  Capabilities 

DBMS  1 
SCORE 

DBMS  2 
SCORE 

Easy  Accessibility 

Programmer  Language 
Machine  Oriented 
Commercial 
Scientific 

End  User  Languages 
Commercial 
Scientific 

Data  Communications 
Support 

TOTAL 

PRIORITY  WEIGHT: 
MANDATORY  FEATURES 

RESULTS 

c 
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DBMS  EVALUATION  CRITERIA  2 

Basic  Functional  Capabilities 

DBMS  1 
SCORE 

DBMS  2 
SCORE 

Multiple  Views  of  Data 
Sequential 
Random 
Indexed 

Multiple  Indices 

TOTAL 

PRIORITY  WEIGHT: 

RESULTS 

MANDATORY  FEATURES 

© 
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DBMS  EVALUATION  CRITERIA  4 

Data  Independence 

DBMS  1 
SCORE 

DBMS  2 
SCORE 

Levels  of  Mapping 
Internal 
Conceptual 
External 

Field  Level  Definition 
Format  Translation 
Request  Elements  In 

TOTAL 

PRIORITY  WEIGHT. 

RESULTS 

MANDATORY  FEATURES 

DBMS  EVALUATION  CRITERIA  3 

Basic  Functional  Capabilities 

DBMS  1 
SCORE 

DBMS  2 
SCORE 

Data  Consolidation 

Variable  Length  Entities 
RG  Types/Entity 
Occurrences/RG  Type 
VL  Occurrences 
Nested  Levels 

Entity  Relationships 
Per  DB 
Per  Entity 
Per  Program 

TOTAL 

PRIORITY  WEIGHT: 
MANDATORY  FEATURES 

RESULTS 
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DATA  INDEPENDENCE  5 DATA  INDEPENDENCE  5 


© 


© 
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DBMS  EVALUATION  CRITERIA  6 

Data  Integrity 

DBMS  1 
SCORE 

DBMS  2 
SCORE 

Exclusive  Control 

Inter-Partition 
Lowest  Level 
Lowest  Isolated  Level 

Intra-Partition 
Lowest  Level 
Lowest  Isolated  Level 

Deadlock  Possible? 

Who  is  Responsible  for 
Establishing  Excl.  Contr. 
Maintaining  Isolation 
Resolving  Deadlocks 

TOTAL 

PRIORITY  WE IGHT : 
MANDATORY  FEATURES 

RESULTS 

( 


DBMS  EVALUATION  CRITERIA  7 

Data  Integrity 

DBMS  1 
SCORE 

DBMS  2 
SCORE 

Recovery 

Logging  After  Images 
Responsibility 

Utility  Support 
Copy/Restore 
Log  Summarization 
Log  Application 
Other 

Smallest  Recoverable 
Unit 

TOTAL 

PRIORITY  WEIGHT: 
MANDATORY  FEATURES 

RESULTS 
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DBMS  EVALUATION  CRITERIA  8 

Data  Integrity 

DBMS  1 
SCORE 

DBMS  2 
SCORE 

Batch  Restart 

Logging  Before  Images 
Responsibility 

Utility  Support 
Log  Tape  Fix 
Backout 

Intermediate  Restart 
Points 

TOTAL 

PRIORITY  WEIGHT: 
MANDATORY  FEATURES 

RESULTS 

DBMS  EVALUATION  CRITERIA  9 

Data  Integrity 

DBMS  1 
SCORE 

DBMS  2 
SCORE 

On  Line  Restart 

Message  Logging 
Responsibility 
Same  Log  as  DB 

Log  Synchronization 

Task  Restart 

System  Restart 

Messages  Reprocessed 
Same  Event  Sequence 

TOTAL 

PRIORITY  WEIGHT: 
MANDATORY  FEATURES 

RESULTS 

© 
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DBMS  EVALUATION  CRITERIA  10 

Data  Security 

DBMS  1 
SCORE 

DBMS  2 
SCORE 

Data  Security 

Restriction 

Mechanism 

Level 

Access  Options 

Enforceability 

Is  Programmer  Involved? 

TOTAL 

PRIORITY  WEIGHT: 
MANDATORY  FEATURES 

RESULTS 

DBMS  EVALUATION  CRITERIA 

11 

Ease  of  Use 

DBMS  1 
SCORE 

DBMS  2 
SCORE 

DBA  Tools 
DDL  Type 
Design  Aids 
Measurement  Aids 
Documentation  & 
Control  Aids 
DB  Restructuring  Aids 
Conversion  Aids 
Education 
Externals 

Internals 

Documentation 

User  Guides 

Reference 

Logic 

TOTAL 

PRIORITY  WEIGHT: 

RESULTS 

MANDATORY  FEATURES 
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DBMS  EVALUATION  CRITERIA  12 

Ease  of  Use 

DBMS  1 
SCORE 

DBMS  2 
SCORE 

Application  Programmer 

Data  Manipulation  Language 
Type 

Total  Operations 

Record  Types/Command 

Data  Search 

High,  Low,  Equal 
Boolean 

Multiple  Record  Types 

TOTAL 

PRIORITY  WEIGHT: 
MANDATORY  FEATURES 

RESULTS 

DBMS  EVALUATION  CRITERIA 

13 

Ease  of  Use 

DBMS  1 
SCORE 

DBMS  2 
SCORE 

End  User 

User  Language  Support 
Batch/On  Line 
Retrieve/Update/Add/ 
Delete 

Applications  Support 

TOTAL 

PRIORITY  WEIGHT: 

RESULTS 

MANDATORY  FEATURES 

( 
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DBMS  EVALUATION  CRITERIA  14 

Cost/Performance 

DBMS  1 
SCORE 

DBMS  2 
SCORE 

Measurable  Costs 
Price  of  Package 
Installation  Charges 
Maintenance  Charges 

Real  Memory  Required 
First  User 
Each  Additional 

TOTAL 

PRIORITY  WEIGHT: 
MANDATORY  FEATURES 

RESULTS 
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DBMS  EVALUATION  CRITERIA  15 


Cost/Performance 


DBMS  1 
SCORE 


DBMS  2 
SCORE 


Performance  Constraints 
DBMS  Architecture 
Single  or  Multi  Thread 
Lockout  Level 
Inter  Partition 
Intra  Partition 
Input/Output 
Buffer  Management 
Data  Grouping 
RG  Types/Dataset 
Datasets/Data  Entity 
Direct  Relationships 
Access  Method  Employed 
CPU 

Subset  Available 

Multiprocessor 

Distributed 


TOTAL 


PRIORITY  WEIGHT: 


MANDATORY  FEATURES 


RESULTS 


